1. Introduction
Queueing networks (QNs) are systems of theoretical and practical interest in the design of computing systems [Reference Kleinrock14], as well as in the optimization of business processes arising in factories, shops, offices, and hospitals [Reference Buzacott and Shanthikumar6, Reference Koole and Mandelbaum15, Reference Osorio and Bierlaire21]. They are formed by interconnected resources routing and serving jobs, and their behaviour often gives rise to complex families of stochastic (jump) processes. In applications, they provide the means to assess modifications, diagnose performance, and evaluate robustness in multiple service infrastructures.
Formally, a QN is associated with a (Markov) jump process: the piecewise-deterministic model for job counts stationed at each resource. Here, a resource consists of a service station along with a queue, because the processing capabilities of a station are usually limited. Whenever a job is serviced, it is routed from its current resource into a new queue, and we say that the process has jumped. Hence, a jump adds a count unit to one resource by subtracting it from another, and we say that jumps in QNs are coupled. Finally, the process is parametrized by the set of intensity rates, which determine the pace of service across job types and the multiple stations.
Performance evaluation. In applications, the service rates in a QN may be unknown, and the various bottlenecks in the routing mechanism for jobs (across queues) are hard to anticipate. Consequently, network deployments are periodically monitored to collect performance measurements, which often include queue lengths, visit counts, or response times. These measurements are a consequence of the stochastic evolution of the underlying Markov model; thus, they form the quantitative basis for drawing inferences on multiple aspects of the network. This inferential process is referred to as performance evaluation [Reference Cooper8]; it is not limited to estimating the service rates, but also includes (i) quantifying any uncertainty around estimations and (ii) identifying queue length distributions under varying networks conditions.
From a probabilistic viewpoint, the primary inferential objectives are to parametrize the regular conditional probability for stochastic network trajectories or paths, conditioned on the available performance measurements; and to derive closed-form expressions for the induced distributions over (i) intensity rates and (ii) queue lengths over time. In this paper, we will observe that this goal calls for the evaluation of complex intractable integrals, a problem commonly encountered within Bayesian statistical settings [Reference Armero and Bayarri1]. In that context, the approach is generally to deviate from analytical methodology and instead construct numerical representations of the induced distributions (such as density plots and histograms), using intensive Markov chain Monte Carlo (MCMC) procedures [Reference Perez, Hodge and Kypraios22, Reference Sutton and Jordan27]. However, numerical methods must sample the trajectories of the underlying stochastic model, conditioned on measurements. These work well in analogous problems with jump processes observed in mathematical biology [Reference Hobolth and Stone12] or genetics [Reference Golightly and Wilkinson11], but scale poorly to complex multivariate settings common in QNs [Reference Bobbio, Gribaudo and Telek5], which account for coupled jumps, job priorities, service types, and feedback loops. Also, numerical methods are reliant on sufficient readily available measurements to describe the likely behaviour of the system. In real-world applications, performance measurements are scarce [Reference Sutton and Jordan27]; for instance, in large externally managed implementations the monitoring may only be executed end-to-end [Reference Liu, Wynter, Xia and Zhang17], i.e., exclusively for the input and output resources where jobs enter and depart the QN.
Consequently, pragmatic statistical solutions are restricted in scope and often ignore the temporal component of the Markov process. Instead, they construct varied inferential methodologies by relying on the mass representation for the network’s stationary distribution [Reference Kraft, Pacheco-Sanchez, Casale and Dawson16], and the objective is usually only to offer point estimates for the unknown service rates (see [Reference Spinner, Casale, Brosig and Kounev26] for a review).
1.1. Approximate inferential methods
In order to derive closed-form expressions for the induced conditional distributions over service rates or network trajectories, we must resort to approximate inferential settings of significant probabilistic complexity. Here, the idea is to operate under some (suitably parametrized) approximating measure in the probability space. This approximating measure should define alternative sample trajectories for the underlying Markov jump model (for job counts across resources) and should ensure analytical tractability of the various integrals that arise in the inferential problem. However, the reader will note that closed-form formulae derived under some (conveniently defined) approximating measure will not be necessarily representative of the real induced distributions. An important step under an approximate inferential set-up is to minimize the dissimilarity (or divergence) between (i) the proposed approximating measure and (ii) the true regular conditional probability over events, conditioned on performance metrics (which also defines a measure).
This approximate procedure is a popular technique in Bayesian statistics, and the resulting closed-form representations are commonly termed variational approximations [Reference Blei, Kucukelbir and McAuliffe4], because their derivation relies heavily on functional optimization methods from the field of calculus of variations. The choices made in order to parametrize suitable approximating measures are key to success; however, the entire process is convoluted, and there exist no guidelines suited to every kind of stochastic model. The standard choice in the statistical literature relies on the strongest of all independence assumptions (cf. [Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Opper and Sanguinetti18]), describing each marginal resource (within the multivariate Markov jump model) as a completely independent stochastic process. The underlying parametrization is commonly referred to as the mean-field approximating measure, because it draws from the physical mean-field theory, where high-dimensional stochastic models are studied through approximations of multiple independent models in combination. Overall, the method works well in practice and has been shown to retrieve accurate conditional density representations in multiple application domains, including predator–prey models, epidemics, networks, and similar processes [Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Opper and Sanguinetti18, Reference Opper and Sanguinetti19, Reference Wang, Blei and Heckerman28, Reference Zechner30, Reference Zhang, Pan and Rao31].
However, an important commonality across all Markov models addressed in prior work is that jumps in the marginal components take place one at a time. This ensures that the multivariate stochastic process can indeed be approximated by a collection of independent univariate models. Our intend is to approximate the stochastic dynamics for jobs counts across resources in a QN, which are subject to coupled jumps; here, the reader may note that it is probabilistically infeasible to assume full independence, for two independent jump models may not change state at the same time. Currently, there exists no viable alternative or solution that can facilitate an approximate inferential set-up tailored to multivariate systems such as QNs.
Key contributions. In this paper, we will formally describe why coupled jumps hinder the construction of approximating measures as commonly defined in the literature. We will then prove that it is still possible to efficiently enable the approximate inferential task, by considering a novel space expansion treatment of the underlying jump process; in a nutshell, we will do the following:
- 
1. Shift the scope. Traditionally, a QN is formalized as a counting process of jobs queueing in the resources. We instead address a process of job transition counts across resources. 
- 
2. Augment the support space, so that job counts across queues can become negative. 
Finally, we present use cases of our proposed procedure for performance evaluation tasks, applied to example inferential problems where (i) numerical methods do not scale and (ii) approximate variational procedures are unusable. The results within this paper are relevant for single- or multi-class Markovian systems (with exponential or phase-type service times), with either finite or infinite processors, and multiple types of service disciplines and probabilistic routings.
Structure. The rest of the paper is organized as follows. In Section 2 we offer a hierarchical formulation of a queueing system, along with the problem statement. Section 3 introduces an approximating network model and offers a summary of the main results to be presented later in the paper. Sections 4 and 5 include the main contributions of our work; they discuss the treatment of the network system by means of interactions in network resources, and they further present the results, proofs, and technical details that contribute to later algorithmic constructions. In Section 6, we guide the reader through applications of our results within inferential and network evaluation tasks, and in Section 7 we conclude the paper with a discussion.
2. Queueing systems and jump processes
 In the following, we employ shorthand notation for densities, base measures, and distributions whenever these are clear from the context. From here on, let 
 $(\Omega,\mathcal{F})$
 denote a measurable space with the regular conditional probability property, supporting the various rates, trajectories, and observations. A general-form QN comprises some
$(\Omega,\mathcal{F})$
 denote a measurable space with the regular conditional probability property, supporting the various rates, trajectories, and observations. A general-form QN comprises some 
 $M\in\mathbb{N}$
 service stations along with a set of job classes
$M\in\mathbb{N}$
 service stations along with a set of job classes 
 $\mathcal{C}$
. The stations are connected by a network topology that governs the underlying routing mechanisms; when a job has been serviced in one station, it can either queue for service at a different station or depart the network. Such a topology is often defined as a set of routing probability matrices
$\mathcal{C}$
. The stations are connected by a network topology that governs the underlying routing mechanisms; when a job has been serviced in one station, it can either queue for service at a different station or depart the network. Such a topology is often defined as a set of routing probability matrices 
 $\{P^c\}_{c\in\mathcal{C}}$
, with elements
$\{P^c\}_{c\in\mathcal{C}}$
, with elements 
 $p^c_{i,j}$
, for all
$p^c_{i,j}$
, for all 
 $0\leq i,j \leq M$
, that denote the probability that a job in class
$0\leq i,j \leq M$
, that denote the probability that a job in class 
 $c\in\mathcal{C}$
 will immediately transit to queueing station j after service completion in station i. In open queueing systems, the index 0 is used as a virtual external source (and destination) of job arrivals to (and departures from) the network. In closed systems, this index may either not exist, or instead refer to a delay server that routes departing jobs back into the network. Also, it holds that
$c\in\mathcal{C}$
 will immediately transit to queueing station j after service completion in station i. In open queueing systems, the index 0 is used as a virtual external source (and destination) of job arrivals to (and departures from) the network. In closed systems, this index may either not exist, or instead refer to a delay server that routes departing jobs back into the network. Also, it holds that 
 $\sum_{j=0}^M p^c_{i,j} = 1$
, for all
$\sum_{j=0}^M p^c_{i,j} = 1$
, for all 
 $0\leq i\leq M$
,
$0\leq i\leq M$
, 
 $c\in\mathcal{C}$
.
$c\in\mathcal{C}$
.
 We address time-homogeneous Markovian systems that are parametrized by exponential inter-arrival and service times, with non-negative rates 
 $\boldsymbol{\mu} = \{ \mu^c_i \in\mathbb{R}_+  : 0\leq i\leq M, c\in\mathcal{C}\}$
, which may vary across service stations and job classes. The servers in the network stations may have finite or infinite processors, and service disciplines can vary across a range of processor sharing (PS) policies, including first-come first-served (FCFS) and variations such as last-come first-served (LCFS) or service in random order (SIRO). In some cases, FCFS processors may require shared processing times across the various job classes (cf. [Reference Baskett, Chandy, Muntz and Palacios3]). For simplicity and ease of notation, class switching, service priorities, and queue-length-dependent service rates are not discussed in detail; however, these follow naturally, and we later present some examples involving them. Under standard exponential service assumptions, the underlying system behaviour is described by a Markov jump process
$\boldsymbol{\mu} = \{ \mu^c_i \in\mathbb{R}_+  : 0\leq i\leq M, c\in\mathcal{C}\}$
, which may vary across service stations and job classes. The servers in the network stations may have finite or infinite processors, and service disciplines can vary across a range of processor sharing (PS) policies, including first-come first-served (FCFS) and variations such as last-come first-served (LCFS) or service in random order (SIRO). In some cases, FCFS processors may require shared processing times across the various job classes (cf. [Reference Baskett, Chandy, Muntz and Palacios3]). For simplicity and ease of notation, class switching, service priorities, and queue-length-dependent service rates are not discussed in detail; however, these follow naturally, and we later present some examples involving them. Under standard exponential service assumptions, the underlying system behaviour is described by a Markov jump process 
 $X=(X_t)_{t\geq 0}$
 with values defined in a measurable space
$X=(X_t)_{t\geq 0}$
 with values defined in a measurable space 
 $(\mathcal{S},\mathcal{P}(\mathcal{S}))$
. Here,
$(\mathcal{S},\mathcal{P}(\mathcal{S}))$
. Here, 
 $\mathcal{S}$
 denotes a countable set of feasible states in the network, usually infinite in open or mixed systems and finite in closed ones;
$\mathcal{S}$
 denotes a countable set of feasible states in the network, usually infinite in open or mixed systems and finite in closed ones; 
 $\mathcal{P}(\mathcal{S})$
 denotes the power set of
$\mathcal{P}(\mathcal{S})$
 denotes the power set of 
 $\mathcal{S}$
. We allow for
$\mathcal{S}$
. We allow for 
 $\mathcal{S}$
 to support vectors of integers that represent job counts across the various class types and service stations, and denote by
$\mathcal{S}$
 to support vectors of integers that represent job counts across the various class types and service stations, and denote by 
 $X_t^{i,c}$
 the number of class-c jobs in station
$X_t^{i,c}$
 the number of class-c jobs in station 
 $i>0$
 at time
$i>0$
 at time 
 $t\geq 0$
. Note that here we ignore the queue lengths in the external delay
$t\geq 0$
. Note that here we ignore the queue lengths in the external delay 
 $(i=0)$
 within closed systems, since these are uniquely determined given the number of jobs in the remaining stations. The infinitesimal generator matrix Q of X is such that
$(i=0)$
 within closed systems, since these are uniquely determined given the number of jobs in the remaining stations. The infinitesimal generator matrix Q of X is such that 
 \begin{equation*}\mathbb{P}(X_{t+\mathrm{d}t}=x'|X_{t}=x) = \mathbb{I}(x=x') + Q_{x,x'}\mathrm{d}t + o(\mathrm{d}t)\end{equation*}
\begin{equation*}\mathbb{P}(X_{t+\mathrm{d}t}=x'|X_{t}=x) = \mathbb{I}(x=x') + Q_{x,x'}\mathrm{d}t + o(\mathrm{d}t)\end{equation*}
for all 
 $x,x'\in\mathcal{S}$
. This can be an infinite matrix; it is generally sparse, and its entries describe rates for transitions across states in
$x,x'\in\mathcal{S}$
. This can be an infinite matrix; it is generally sparse, and its entries describe rates for transitions across states in 
 $\mathcal{S}$
. Rows in Q must sum to 0 so that
$\mathcal{S}$
. Rows in Q must sum to 0 so that 
 $Q_{x,x'}\geq 0$
 for all
$Q_{x,x'}\geq 0$
 for all 
 $x \neq x'$
, and
$x \neq x'$
, and 
 $Q_x \,{:}\,{\raise-1.5pt{=}}\, Q_{x,x} = - \sum_{x'\in\mathcal{S}: x\neq x'} Q_{x,x'}$
.
$Q_x \,{:}\,{\raise-1.5pt{=}}\, Q_{x,x} = - \sum_{x'\in\mathcal{S}: x\neq x'} Q_{x,x'}$
.
 Hence, jumps in the process X are caused by jobs being routed through stations in the underlying network model. We often say that a state 
 $x'\in\mathcal{S}$
 is accessible from
$x'\in\mathcal{S}$
 is accessible from 
 $x\in\mathcal{S}$
, and write
$x\in\mathcal{S}$
, and write 
 $x\xrightarrow{i,j,c} x'$
 for its corresponding jump, if x
$x\xrightarrow{i,j,c} x'$
 for its corresponding jump, if x
 $^{\prime}$
 may be reached from x by means of a class-c job transition between the stations i and j, in the direction
$^{\prime}$
 may be reached from x by means of a class-c job transition between the stations i and j, in the direction 
 $i\rightarrow j$
. We further denote by
$i\rightarrow j$
. We further denote by 
 \begin{align}\mathcal{T}=\{(i,j,c)\in\{0,\dots,M\}^2\times\mathcal{C}: p_{i,j}^c>0\} \end{align}
\begin{align}\mathcal{T}=\{(i,j,c)\in\{0,\dots,M\}^2\times\mathcal{C}: p_{i,j}^c>0\} \end{align}
the finite set of all feasible job transitions in the system, and we remark that the generator Q of X is populated by some positive real-valued rates 
 $\boldsymbol{\lambda}= \{ \lambda_{\boldsymbol{\eta}} \in\mathbb{R}_+ \, : \, {\boldsymbol{\eta}}\in\mathcal{T}\} $
 that define the intensities for these job routings, with
$\boldsymbol{\lambda}= \{ \lambda_{\boldsymbol{\eta}} \in\mathbb{R}_+ \, : \, {\boldsymbol{\eta}}\in\mathcal{T}\} $
 that define the intensities for these job routings, with 
 $\lambda_{i,j,c} = \mu^c_{i}\cdot p^c_{i,j}$
 for all
$\lambda_{i,j,c} = \mu^c_{i}\cdot p^c_{i,j}$
 for all 
 $(i,j,c)\in\mathcal{T}$
.
$(i,j,c)\in\mathcal{T}$
.
Example 1. In Figure 1 we observe diagrams that illustrate this notation in an open single-class network. On the left, we see three stations with different rates, disciplines, and server counts. The topology P is such that 
 $|\mathcal{T}|=5$
 and
$|\mathcal{T}|=5$
 and 
 $p_{0,1}=1-p_{0,2}\in (0,1)$
,
$p_{0,1}=1-p_{0,2}\in (0,1)$
, 
 $p_{1,3}=p_{2,3}=p_{3,0}=1$
 (
$p_{1,3}=p_{2,3}=p_{3,0}=1$
 (
 $p_{i,j}=0$
 otherwise). On the right, we find the corresponding job transition rates across the four pairs of connected stations. Now, let
$p_{i,j}=0$
 otherwise). On the right, we find the corresponding job transition rates across the four pairs of connected stations. Now, let 
 $\vee$
 and
$\vee$
 and 
 $\wedge$
 constitute short-hand notation for maxima and minima, respectively. In this single-class example, X monitors counts across the stations, so that
$\wedge$
 constitute short-hand notation for maxima and minima, respectively. In this single-class example, X monitors counts across the stations, so that 
 $X_t=(X^{1}_t,X^{2}_t,X^{3}_t)\in\mathcal{S}$
 for all
$X_t=(X^{1}_t,X^{2}_t,X^{3}_t)\in\mathcal{S}$
 for all 
 $t\geq 0$
; also, the generator Q is an infinite matrix with
$t\geq 0$
; also, the generator Q is an infinite matrix with 
 $\textstyle Q_{x,x'}=\lambda_{i,j}\cdot (K_i \wedge x_i )$
 for all pairs
$\textstyle Q_{x,x'}=\lambda_{i,j}\cdot (K_i \wedge x_i )$
 for all pairs 
 $x,x'\in\mathcal{S}$
 with associated transition
$x,x'\in\mathcal{S}$
 with associated transition 
 $x\xrightarrow{i,j} x'$
, where
$x\xrightarrow{i,j} x'$
, where 
 $K_i,x_i\in\mathbb{N}_0$
 denote the number of processors and the queue length within station
$K_i,x_i\in\mathbb{N}_0$
 denote the number of processors and the queue length within station 
 $i\geq 0$
. We finally have
$i\geq 0$
. We finally have 
 $K_1=1, K_2=\infty$
, and
$K_1=1, K_2=\infty$
, and 
 $K_3=2$
; at the virtual source, we always have
$K_3=2$
; at the virtual source, we always have 
 $K_0 \wedge x_0 = 1$
. Thus, note that transition rates in X further depend on the queue lengths and resemble kinetic laws within chemical reaction models [Reference Golightly and Wilkinson11].
$K_0 \wedge x_0 = 1$
. Thus, note that transition rates in X further depend on the queue lengths and resemble kinetic laws within chemical reaction models [Reference Golightly and Wilkinson11].

Figure 1: Left: open bottleneck network with three service stations. Shaded circles indicate servers; empty rectangles indicate queueing areas. The box is a probabilistic junction for the routing of arrivals. Right: job transition intensities across network stations.
2.1. A hierarchical formulation of queueing systems
 Within a hierarchical multilevel formulation, rates in 
 $\boldsymbol{\lambda}$
 have a distribution (or image)
$\boldsymbol{\lambda}$
 have a distribution (or image) 
 $\mathbb{P}_{\boldsymbol{\lambda}}\equiv \boldsymbol{\lambda}_*\mathbb{P}$
 under a reference measure
$\mathbb{P}_{\boldsymbol{\lambda}}\equiv \boldsymbol{\lambda}_*\mathbb{P}$
 under a reference measure 
 $\mathbb{P}$
 on
$\mathbb{P}$
 on 
 $(\Omega,\mathcal{F})$
. We assume this to admit a density
$(\Omega,\mathcal{F})$
. We assume this to admit a density 
 $f_{\boldsymbol{\lambda}}$
 with respect to a base measure that will further induce (by properties of exponential transitions) distributions over the service rates
$f_{\boldsymbol{\lambda}}$
 with respect to a base measure that will further induce (by properties of exponential transitions) distributions over the service rates 
 $\boldsymbol{\mu}$
 and routing topology. Next, note that a network trajectory over a finite interval is a piecewise deterministic jump process, such that
$\boldsymbol{\mu}$
 and routing topology. Next, note that a network trajectory over a finite interval is a piecewise deterministic jump process, such that 
 $X\equiv (\textbf{t},\textbf{x})$
 is represented by a sequence of transition times
$X\equiv (\textbf{t},\textbf{x})$
 is represented by a sequence of transition times 
 $\textbf{t}$
 along with states
$\textbf{t}$
 along with states 
 $\textbf{x}$
. Each pair
$\textbf{x}$
. Each pair 
 $(\textbf{t},\textbf{x})$
 is furthermore a random variable on a measurable space
$(\textbf{t},\textbf{x})$
 is furthermore a random variable on a measurable space 
 $(\mathcal{X},\Sigma_\mathcal{X})$
 supporting finite
$(\mathcal{X},\Sigma_\mathcal{X})$
 supporting finite 
 $\mathcal{S}$
-valued trajectories, and a conditional density
$\mathcal{S}$
-valued trajectories, and a conditional density 
 $f_{X|\boldsymbol{\lambda}}$
 may be defined with respect to a dominating base measure
$f_{X|\boldsymbol{\lambda}}$
 may be defined with respect to a dominating base measure 
 $\mu_\mathcal{X}$
, such that the regular conditional probability
$\mu_\mathcal{X}$
, such that the regular conditional probability 
 $\mathbb{P}(A|\boldsymbol{\lambda})$
,
$\mathbb{P}(A|\boldsymbol{\lambda})$
, 
 $A\in\mathcal{F}$
, satisfies
$A\in\mathcal{F}$
, satisfies 
 \begin{equation*}\mathbb{P}(X^{-1}(B)|\boldsymbol{\lambda}) = \int_B f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x}) \,\mu_\mathcal{X}(d\textbf{t},d\textbf{x})\end{equation*}
\begin{equation*}\mathbb{P}(X^{-1}(B)|\boldsymbol{\lambda}) = \int_B f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x}) \,\mu_\mathcal{X}(d\textbf{t},d\textbf{x})\end{equation*}
for all 
 $B\in\Sigma_{\mathcal{X}}$
 (see Appendix A for details). In this case,
$B\in\Sigma_{\mathcal{X}}$
 (see Appendix A for details). In this case, 
 \begin{align}f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x}) = \pi(x_0)\, e^{Q_{x_{I}}(T-t_I)}\, \prod_{i=1}^I Q_{x_{i-1},x_{i}}\, e^{Q_{x_{i-1}}(t_i-t_{i-1})} \end{align}
\begin{align}f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x}) = \pi(x_0)\, e^{Q_{x_{I}}(T-t_I)}\, \prod_{i=1}^I Q_{x_{i-1},x_{i}}\, e^{Q_{x_{i-1}}(t_i-t_{i-1})} \end{align}
for every pair of ordered times 
 $\textbf{t}=\{0,t_1,\dots,t_I\}$
 in [0, T] and states
$\textbf{t}=\{0,t_1,\dots,t_I\}$
 in [0, T] and states 
 $\textbf{x}=\{x_0,\dots,x_I\}$
. Here,
$\textbf{x}=\{x_0,\dots,x_I\}$
. Here, 
 $\pi(\cdot)$
 denotes an arbitrary distribution over initial states, and
$\pi(\cdot)$
 denotes an arbitrary distribution over initial states, and 
 $Q\equiv Q(\boldsymbol{\lambda})$
 is the matrix of infinitesimal rates associated with fixed values in
$Q\equiv Q(\boldsymbol{\lambda})$
 is the matrix of infinitesimal rates associated with fixed values in 
 $\boldsymbol{\lambda}$
. The QN model is thus fully parametrized by a collection of hyperparameters. Analogous modelling choices for continuous-time Markov chains or Markov jump processes can be found in [Reference Baele, van de Peer and Vansteelandt2, Reference Huelsenbeck, Bollback and Levine13] or [Reference Zhao32], to name a few.
$\boldsymbol{\lambda}$
. The QN model is thus fully parametrized by a collection of hyperparameters. Analogous modelling choices for continuous-time Markov chains or Markov jump processes can be found in [Reference Baele, van de Peer and Vansteelandt2, Reference Huelsenbeck, Bollback and Levine13] or [Reference Zhao32], to name a few.

Figure 2: Closed QN with a single FCFS service station and a delay.
2.2. Network evaluation and problem statement
 Let 
 $T>0$
 denote some arbitrary terminal time and
$T>0$
 denote some arbitrary terminal time and 
 $\textbf{x}_0\in\mathcal{S}$
 an initial state in X. For simplicity, this is assumed to be a 0-valued vector, where no jobs populate the system. Now, let
$\textbf{x}_0\in\mathcal{S}$
 an initial state in X. For simplicity, this is assumed to be a 0-valued vector, where no jobs populate the system. Now, let 
 $0\leq t_1<\dots<t_K\leq T$
 denote some fixed network monitoring times along with observation variables
$0\leq t_1<\dots<t_K\leq T$
 denote some fixed network monitoring times along with observation variables 
 $\{O_k\in\mathcal{O},k=1,\dots,K\}$
, for some arbitrary support set
$\{O_k\in\mathcal{O},k=1,\dots,K\}$
, for some arbitrary support set 
 $\mathcal{O}$
, such that
$\mathcal{O}$
, such that
 \begin{align}\mathbb{P}\left(\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)\big|X\right) = \prod_{k=1}^K \mathbb{P}(O_k^{-1}(\textbf{o}_k)|X)  = \prod_{k=1}^K f_{O|X_{t_k}}(\textbf{o}_k)\end{align}
\begin{align}\mathbb{P}\left(\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)\big|X\right) = \prod_{k=1}^K \mathbb{P}(O_k^{-1}(\textbf{o}_k)|X)  = \prod_{k=1}^K f_{O|X_{t_k}}(\textbf{o}_k)\end{align}
for any sequence of elements 
 $\textbf{o}_1,\dots,\textbf{o}_K$
, where
$\textbf{o}_1,\dots,\textbf{o}_K$
, where 
 $\textbf{o}_k$
 denotes the time-
$\textbf{o}_k$
 denotes the time-
 $t_k$
 network observation across all stations. Hence, observations are conditionally mutually independent given the network states. The term
$t_k$
 network observation across all stations. Hence, observations are conditionally mutually independent given the network states. The term 
 $f_{O|X_{t_k}}$
 stands for a conditional mass function assigned to measurements across the M stations, defined with respect to a counting measure
$f_{O|X_{t_k}}$
 stands for a conditional mass function assigned to measurements across the M stations, defined with respect to a counting measure 
 $\mu_{\mathcal{O}}$
. In this paper it is assumed that
$\mu_{\mathcal{O}}$
. In this paper it is assumed that 
 $f_{O|x}>0$
 (everywhere) for all
$f_{O|x}>0$
 (everywhere) for all 
 $x\in\mathcal{S}$
; however, deterministic observations such as queue lengths can be easily approximated by means of regularized indicator functions. Extensions to continuous settings are straightforward.
$x\in\mathcal{S}$
; however, deterministic observations such as queue lengths can be easily approximated by means of regularized indicator functions. Extensions to continuous settings are straightforward.
Example 2. Consider a vanilla closed network as pictured in Figure 2. It includes a single FCFS service station, 
 $K=1$
 processing unit, and a delay routing a population of N jobs in a closed loop. The evolution of
$K=1$
 processing unit, and a delay routing a population of N jobs in a closed loop. The evolution of 
 $X=(X_t)_{t\geq 0}$
 monitors the total number of jobs within the service station, with
$X=(X_t)_{t\geq 0}$
 monitors the total number of jobs within the service station, with 
 $X_0=0$
. Variables
$X_0=0$
. Variables 
 $O_k=(O_{0,k},O_{1,k})$
 at times
$O_k=(O_{0,k},O_{1,k})$
 at times 
 $t_k$
 approximate a system with deterministic observations of queue lengths (in both station and delay) and are supported on
$t_k$
 approximate a system with deterministic observations of queue lengths (in both station and delay) and are supported on 
 $\mathcal{O}=\{0,\dots,N\}^2$
. The observation model may thus factor across the network components, so that, for example,
$\mathcal{O}=\{0,\dots,N\}^2$
. The observation model may thus factor across the network components, so that, for example, 
 $f_{O|x}(\textbf{o}) = \tilde{f}_{O|N-x}(o_0) \cdot \tilde{f}_{O|x}(o_1)$
 with
$f_{O|x}(\textbf{o}) = \tilde{f}_{O|N-x}(o_0) \cdot \tilde{f}_{O|x}(o_1)$
 with 
 $\tilde{f}_{O|x}(o)=\frac{\epsilon}{N} + \mathbb{I}(o=x) \cdot (1-\frac{N+1}{N}\epsilon)$
, and
$\tilde{f}_{O|x}(o)=\frac{\epsilon}{N} + \mathbb{I}(o=x) \cdot (1-\frac{N+1}{N}\epsilon)$
, and 
 $$\matrix{
   \mathbb{P}{(O_k^{ - 1}({\bf{o}})|X) = \left\{ {\matrix{
   {{{(1 - {\epsilon})}^2},} \hfill & {{o_0} = N - {X_{{t_k}}},{\mkern 1mu} {o_1} = {X_{{t_k}}},} \hfill  \cr 
   {{{({\epsilon}/N)}^2},} \hfill & {{o_0} \ne N - {X_{{t_k}}},{\mkern 1mu} {o_1} \ne {X_{{t_k}}},} \hfill  \cr 
   {(1 -{\epsilon})\cdot{\epsilon}/N} \hfill & {{\rm{otherwise,}}} \hfill  \cr 
 } } \right.} \hfill  \cr 
 } $$
$$\matrix{
   \mathbb{P}{(O_k^{ - 1}({\bf{o}})|X) = \left\{ {\matrix{
   {{{(1 - {\epsilon})}^2},} \hfill & {{o_0} = N - {X_{{t_k}}},{\mkern 1mu} {o_1} = {X_{{t_k}}},} \hfill  \cr 
   {{{({\epsilon}/N)}^2},} \hfill & {{o_0} \ne N - {X_{{t_k}}},{\mkern 1mu} {o_1} \ne {X_{{t_k}}},} \hfill  \cr 
   {(1 -{\epsilon})\cdot{\epsilon}/N} \hfill & {{\rm{otherwise,}}} \hfill  \cr 
 } } \right.} \hfill  \cr 
 } $$
for a slack regularizing parameter 
 $\epsilon>0$
 and all
$\epsilon>0$
 and all 
 $k=1,\dots,K$
. This will account for some
$k=1,\dots,K$
. This will account for some 
 $\epsilon \cdot 100 \%$
 faulty measurements. Finally, note that the framework can be generalized to accommodate different observation types. For simplicity of presentation, in this paper we focus on queue lengths.
$\epsilon \cdot 100 \%$
 faulty measurements. Finally, note that the framework can be generalized to accommodate different observation types. For simplicity of presentation, in this paper we focus on queue lengths.
 Now, let 
 $\mathbb{P}(A|\textbf{o}_1,\dots,\textbf{o}_K)$
,
$\mathbb{P}(A|\textbf{o}_1,\dots,\textbf{o}_K)$
, 
 $A\in\mathcal{F}$
,
$A\in\mathcal{F}$
, 
 $(\textbf{o}_1,\dots,\textbf{o}_K)\in  \mathcal{O}^K$
, denote the regular conditional probability across global events, conditioned on observations. Our primary interest lies in its induced distribution over the intensity rates (which we denote by
$(\textbf{o}_1,\dots,\textbf{o}_K)\in  \mathcal{O}^K$
, denote the regular conditional probability across global events, conditioned on observations. Our primary interest lies in its induced distribution over the intensity rates (which we denote by 
 $\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
) and trajectories. Within (Bayesian) inferential settings, these induced distributions are referred to as posteriors. For simplicity, we will restrict the problem statement to intensity rate posteriors; however, the reader will note that an analogous presentation is easily deduced for network trajectories. The posterior distribution exists and admits a density carried by its corresponding prior
$\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
) and trajectories. Within (Bayesian) inferential settings, these induced distributions are referred to as posteriors. For simplicity, we will restrict the problem statement to intensity rate posteriors; however, the reader will note that an analogous presentation is easily deduced for network trajectories. The posterior distribution exists and admits a density carried by its corresponding prior 
 $\mathbb{P}_{\boldsymbol{\lambda}}$
 (see Appendix A); moreover, the transformation is proportional to a weighted product of network paths, and is defined by the Radon–Nikodym derivative
$\mathbb{P}_{\boldsymbol{\lambda}}$
 (see Appendix A); moreover, the transformation is proportional to a weighted product of network paths, and is defined by the Radon–Nikodym derivative 
 \begin{align}\frac{d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}}{d\mathbb{P}_{\boldsymbol{\lambda}}} = \frac{\int_{\mathcal{X}}\big[\prod_{k=1}^K \mathbb{P}(O_k^{-1}(\textbf{o}_k)|\textbf{t},\textbf{x})\big]\, f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})}{\mathbb{P}(O_1^{-1}(\textbf{o}_1)\cap\dots\cap O_K^{-1}(\textbf{o}_K)) } , \end{align}
\begin{align}\frac{d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}}{d\mathbb{P}_{\boldsymbol{\lambda}}} = \frac{\int_{\mathcal{X}}\big[\prod_{k=1}^K \mathbb{P}(O_k^{-1}(\textbf{o}_k)|\textbf{t},\textbf{x})\big]\, f_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})}{\mathbb{P}(O_1^{-1}(\textbf{o}_1)\cap\dots\cap O_K^{-1}(\textbf{o}_K)) } , \end{align}
which corresponds to Bayes’ equation. There, the denominator denotes a normalizing constant that integrates over trajectories and rates. This transformation will often induce a density representation 
 $f_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
 for the posterior distribution with respect to a suitable (Lebesgue) base. In these cases, we may think of the above derivative as a likelihood ratio. However, this ratio poses a tractability problem; that is, the integral over trajectories cannot be computed analytically and must be approximated. This is a common problem in inferential tasks with jump processes (cf. [Reference Hobolth and Stone12, Reference Rao and Teh24, Reference Perez and Kypraios23]), and proposed solutions often rely on intensive MCMC procedures that iterate between trajectories and parameters, including direct sampling, rejection sampling, and uniformization-based methods. Yet algorithms may be hard to implement, computationally demanding, or applicable only to reduced classes of problems. In the case of QNs, strong temporal dependencies in the stochastic trajectories X impose hard coupling properties amongst rates and paths [Reference Sutton and Jordan27], which limits the applicability of state-of-the-art numerical solutions to the simplest types of network evaluation problems [Reference Perez, Hodge and Kypraios22].
$f_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
 for the posterior distribution with respect to a suitable (Lebesgue) base. In these cases, we may think of the above derivative as a likelihood ratio. However, this ratio poses a tractability problem; that is, the integral over trajectories cannot be computed analytically and must be approximated. This is a common problem in inferential tasks with jump processes (cf. [Reference Hobolth and Stone12, Reference Rao and Teh24, Reference Perez and Kypraios23]), and proposed solutions often rely on intensive MCMC procedures that iterate between trajectories and parameters, including direct sampling, rejection sampling, and uniformization-based methods. Yet algorithms may be hard to implement, computationally demanding, or applicable only to reduced classes of problems. In the case of QNs, strong temporal dependencies in the stochastic trajectories X impose hard coupling properties amongst rates and paths [Reference Sutton and Jordan27], which limits the applicability of state-of-the-art numerical solutions to the simplest types of network evaluation problems [Reference Perez, Hodge and Kypraios22].
 Alternatively, the complex integrals in (5) may be approximated through a variational approach, where we suitably parametrize an alternative approximating measure to 
 $\mathbb{P}$
 that will decompose the integrand into multiple independent and analytically tractable parts. However, as yet there exists no approximating measure design to achieve this goal, in light of the complexity of jump processes induced by networks of queues. In the following, we present theoretical results leading to a new variational inferential design for use with jump processes, which significantly deviates from and overcomes multiple limitations found in standard methods that resort to full-independence assumptions [Reference Opper and Sanguinetti18, Reference Opper and Sanguinetti19, Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Wang, Blei and Heckerman28, Reference Zechner30, Reference Zhang, Pan and Rao31].
$\mathbb{P}$
 that will decompose the integrand into multiple independent and analytically tractable parts. However, as yet there exists no approximating measure design to achieve this goal, in light of the complexity of jump processes induced by networks of queues. In the following, we present theoretical results leading to a new variational inferential design for use with jump processes, which significantly deviates from and overcomes multiple limitations found in standard methods that resort to full-independence assumptions [Reference Opper and Sanguinetti18, Reference Opper and Sanguinetti19, Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Wang, Blei and Heckerman28, Reference Zechner30, Reference Zhang, Pan and Rao31].
3. Overview of results
 Note that under the natural measure 
 $\mathbb{P}$
 tied to the infinitesimal generator Q, an underlying Markov jump process X as introduced in Section 2 is supported in a set
$\mathbb{P}$
 tied to the infinitesimal generator Q, an underlying Markov jump process X as introduced in Section 2 is supported in a set 
 $\mathcal{S}$
 of feasible vectors of integers, which is often just
$\mathcal{S}$
 of feasible vectors of integers, which is often just 
 $\mathcal{S}=\mathbb{N}_0^{|\mathcal{C}|\times M}$
. Further recall that
$\mathcal{S}=\mathbb{N}_0^{|\mathcal{C}|\times M}$
. Further recall that 
 $\mathcal{T}$
 in (1) corresponds to possible transition routes in the queueing system, that is, triplets
$\mathcal{T}$
 in (1) corresponds to possible transition routes in the queueing system, that is, triplets 
 $(i,j,c)\in\{0,\dots,M\}$
 such that a class-c job currently in station i has a non-negative probability
$(i,j,c)\in\{0,\dots,M\}$
 such that a class-c job currently in station i has a non-negative probability 
 $p_{i,j}^c>0$
 of transitioning to station j, under the measure
$p_{i,j}^c>0$
 of transitioning to station j, under the measure 
 $\mathbb{P}$
.
$\mathbb{P}$
.
 
Space expansion. Assume the existence of an approximating measure 
 $\tilde{\mathbb{P}}$
 on an augmented space of network paths
$\tilde{\mathbb{P}}$
 on an augmented space of network paths 
 $\tilde{\mathcal{X}}$
, such that we further assign a mass to states with negative queue lengths. Under
$\tilde{\mathcal{X}}$
, such that we further assign a mass to states with negative queue lengths. Under 
 $\tilde{\mathbb{P}}$
, rates for transitions across states are induced by a generator
$\tilde{\mathbb{P}}$
, rates for transitions across states are induced by a generator 
 $\tilde{Q}=\tilde{Q}(\delta,\boldsymbol{\lambda})$
 with
$\tilde{Q}=\tilde{Q}(\delta,\boldsymbol{\lambda})$
 with 
 \begin{align}\tilde{Q}_{x,x'}=\delta+Q_{x,x'}\, , \quad \delta>0, \end{align}
\begin{align}\tilde{Q}_{x,x'}=\delta+Q_{x,x'}\, , \quad \delta>0, \end{align}
whenever 
 $x\xrightarrow{i,j,c} x'$
 and
$x\xrightarrow{i,j,c} x'$
 and 
 $(i,j,c)\in\mathcal{T}$
; that is, whenever a transition from state x to state x
$(i,j,c)\in\mathcal{T}$
; that is, whenever a transition from state x to state x
 $^{\prime}$
 is such that
$^{\prime}$
 is such that
- 
• a class-c job is serviced at station i and routed to j, and 
- 
•  $p_{i,j}^c>0$
 within the original probabilistic routing topology under $p_{i,j}^c>0$
 within the original probabilistic routing topology under $\mathbb{P}$
. $\mathbb{P}$
.
 In any other case, it holds that 
 $\tilde{Q}_{x,x'}=Q_{x,x'}=0$
. Note that the original connectivity structure of the QN is preserved under
$\tilde{Q}_{x,x'}=Q_{x,x'}=0$
. Note that the original connectivity structure of the QN is preserved under 
 $\tilde{\mathbb{P}}$
; however, as described below, jobs may be serviced within queueing stations even if they do not exist:
$\tilde{\mathbb{P}}$
; however, as described below, jobs may be serviced within queueing stations even if they do not exist:
- 
• Following (6), intensities for job transitions between stations i and j are strictly positive whenever  $p^c_{i,j}>0$
. If no class-c job exists within station i, then this intensity corresponds to $p^c_{i,j}>0$
. If no class-c job exists within station i, then this intensity corresponds to $\delta>0$
. $\delta>0$
.
- 
• In the event of a time-t class-c job departure from station i when  $\lim_{s\rightarrow t} X^{i,c}_s\leq0$
 (the job does not exist), we assume this job to be virtually generated, and 
i.e., a unit is subtracted from the state vector at the corresponding index. $\lim_{s\rightarrow t} X^{i,c}_s\leq0$
 (the job does not exist), we assume this job to be virtually generated, and 
i.e., a unit is subtracted from the state vector at the corresponding index. \begin{equation*}X^{i,c}_t = \lim_{s\rightarrow t} X^{i,c}_s - 1,\end{equation*} \begin{equation*}X^{i,c}_t = \lim_{s\rightarrow t} X^{i,c}_s - 1,\end{equation*}
- 
• This preserves the global job population count and forces network states to accommodate negative queue lengths. 
 For values of 
 $\delta$
 small enough, the
$\delta$
 small enough, the 
 $\tilde{\mathbb{P}}$
-density assigned to trajectories outside of
$\tilde{\mathbb{P}}$
-density assigned to trajectories outside of 
 $\mathcal{X}$
 is negligible. Note that a density
$\mathcal{X}$
 is negligible. Note that a density 
 $\tilde{f}$
 in (2) with generator
$\tilde{f}$
 in (2) with generator 
 $\tilde{Q}$
 in (6) is such that, for any network path
$\tilde{Q}$
 in (6) is such that, for any network path 
 $(\textbf{t},\textbf{x})\in\tilde{\mathcal{X}}\backslash\mathcal{X}$
,
$(\textbf{t},\textbf{x})\in\tilde{\mathcal{X}}\backslash\mathcal{X}$
, 
 \begin{equation*}\tilde{f}_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x})\leq \prod_{i=1}^I \tilde{Q}_{x_{i-1},x_i}=O(\delta^r)\quad \text{as} \quad\delta\rightarrow 0\end{equation*}
\begin{equation*}\tilde{f}_{X|\boldsymbol{\lambda}}(\textbf{t},\textbf{x})\leq \prod_{i=1}^I \tilde{Q}_{x_{i-1},x_i}=O(\delta^r)\quad \text{as} \quad\delta\rightarrow 0\end{equation*}
for some 
 $r\in\{1,\dots,I\}$
. Thus,
$r\in\{1,\dots,I\}$
. Thus, 
 $X_*\tilde{\mathbb{P}}(\mathcal{X})=1-\int_{\tilde{\mathcal{X}}\backslash\mathcal{X}} \tilde{f}_{X|\boldsymbol{\lambda}}d\tilde{\mu}_{\mathcal{X}}\xrightarrow{\delta\rightarrow 0} 1$
, where
$X_*\tilde{\mathbb{P}}(\mathcal{X})=1-\int_{\tilde{\mathcal{X}}\backslash\mathcal{X}} \tilde{f}_{X|\boldsymbol{\lambda}}d\tilde{\mu}_{\mathcal{X}}\xrightarrow{\delta\rightarrow 0} 1$
, where 
 $\tilde{\mu}$
 denotes an appropriately augmented base measure, and the limiting system dynamics under
$\tilde{\mu}$
 denotes an appropriately augmented base measure, and the limiting system dynamics under 
 $\tilde{\mathbb{P}}$
 will offer a perfect approximation to the original network model. The rest of the paper proceeds as follows:
$\tilde{\mathbb{P}}$
 will offer a perfect approximation to the original network model. The rest of the paper proceeds as follows:
- 
• In Section 4, we present a counting process Y over job transitions in the augmented network with generator  $\tilde{Q}$
 in (6), and parametrize an alternative absolutely continuous approximating measure $\tilde{Q}$
 in (6), and parametrize an alternative absolutely continuous approximating measure $\mathbb{Q}$
. In Lemma 1, we derive a lower bound to the equivalent log-likelihood for the network measurements. $\mathbb{Q}$
. In Lemma 1, we derive a lower bound to the equivalent log-likelihood for the network measurements.
- 
• Propositions 1 and 2 within Section 5 inspect the structure of  $\mathbb{Q}$
 that best approximates the target regular conditional probability. Corollaries 1 and 2 focus on the rate density $\mathbb{Q}$
 that best approximates the target regular conditional probability. Corollaries 1 and 2 focus on the rate density $d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
 by looking at conjugacy properties and limiting behaviour as $d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
 by looking at conjugacy properties and limiting behaviour as $\delta\rightarrow 0$
. $\delta\rightarrow 0$
.
- 
• Finally, Section 6 describes applications of our results within inferential tasks, allowing the approximation of (image) measures across the various service rates  $\mu$
, routing probabilities in $\mu$
, routing probabilities in $\mathcal{P}$
, and trajectories X, conditioned on network measurements. This includes comparisons with existing alternative methods. $\mathcal{P}$
, and trajectories X, conditioned on network measurements. This includes comparisons with existing alternative methods.
4. A counting process over job transitions
 A network system as introduced in Section 2 gives rise to a multivariate Markov counting process 
 $Y=(Y_t)_{t\geq 0}$
 on
$Y=(Y_t)_{t\geq 0}$
 on 
 $(\Omega,\mathcal{F})$
, where each indexed
$(\Omega,\mathcal{F})$
, where each indexed 
 $Y_t=(Y^{\boldsymbol{\eta}}_t)_{{\boldsymbol{\eta}}\in\mathcal{T}}\in\mathcal{S}^Y$
 accounts for job transitions across all classes in
$Y_t=(Y^{\boldsymbol{\eta}}_t)_{{\boldsymbol{\eta}}\in\mathcal{T}}\in\mathcal{S}^Y$
 accounts for job transitions across all classes in 
 $\mathcal{T}$
, up to a time
$\mathcal{T}$
, up to a time 
 $t\geq 0$
. That is, each
$t\geq 0$
. That is, each 
 $Y_t^{{\boldsymbol{\eta}}}$
 denotes the cumulative count in Y of transitions
$Y_t^{{\boldsymbol{\eta}}}$
 denotes the cumulative count in Y of transitions 
 $x\xrightarrow{{\boldsymbol{\eta}}} x'$
 in X, with
$x\xrightarrow{{\boldsymbol{\eta}}} x'$
 in X, with 
 $x,x'\in\mathcal{S}$
, and
$x,x'\in\mathcal{S}$
, and 
 $Y_0^{{\boldsymbol{\eta}}}=0$
 for all
$Y_0^{{\boldsymbol{\eta}}}=0$
 for all 
 $\boldsymbol{\eta}\in\mathcal{T}$
. At a basic level, these are simply non-decreasing counting processes for job transitions in the directions defined within
$\boldsymbol{\eta}\in\mathcal{T}$
. At a basic level, these are simply non-decreasing counting processes for job transitions in the directions defined within 
 $\mathcal{T}$
. We further note that
$\mathcal{T}$
. We further note that 
 $|\mathcal{T}|$
 is often small, as underlying network topologies impose strict routing mechanisms. The support set
$|\mathcal{T}|$
 is often small, as underlying network topologies impose strict routing mechanisms. The support set 
 $\mathcal{S}^Y$
 for the counting process is determined by the connectivity structure amongst the stations. Under the approximating measure
$\mathcal{S}^Y$
 for the counting process is determined by the connectivity structure amongst the stations. Under the approximating measure 
 $\tilde{\mathbb{P}}$
, it holds that
$\tilde{\mathbb{P}}$
, it holds that 
 $\mathcal{S}^Y=\mathbb{N}_0^{|\mathcal{T}|}$
, since job transitions may occur even if origin stations have no jobs queueing. Now, let
$\mathcal{S}^Y=\mathbb{N}_0^{|\mathcal{T}|}$
, since job transitions may occur even if origin stations have no jobs queueing. Now, let 
 \begin{equation*}\mathcal{T}_{i,c}^{\leftarrow}=\{\boldsymbol{{\boldsymbol{\eta}}}\in\mathcal{T} :\eta_2=i,\eta_3=c\} \quad \text{and} \quad \mathcal{T}_{i,c}^{\rightarrow}=\{\boldsymbol{\eta}\in\mathcal{T} :\eta_1=i,\eta_3=c\}\end{equation*}
\begin{equation*}\mathcal{T}_{i,c}^{\leftarrow}=\{\boldsymbol{{\boldsymbol{\eta}}}\in\mathcal{T} :\eta_2=i,\eta_3=c\} \quad \text{and} \quad \mathcal{T}_{i,c}^{\rightarrow}=\{\boldsymbol{\eta}\in\mathcal{T} :\eta_1=i,\eta_3=c\}\end{equation*}
respectively denote the subsets of 
 $\mathcal{T}$
 that include transitions of jobs in class
$\mathcal{T}$
 that include transitions of jobs in class 
 $c\in\mathcal{C}$
 to and from the network station
$c\in\mathcal{C}$
 to and from the network station 
 $i\in\{0,\dots,M\}$
. Also, recall that
$i\in\{0,\dots,M\}$
. Also, recall that 
 $X_t^{i,c}$
 denotes the number of class-c jobs in station
$X_t^{i,c}$
 denotes the number of class-c jobs in station 
 $i>0$
 at time
$i>0$
 at time 
 $t\geq 0$
; then
$t\geq 0$
; then 
 \begin{align}X^{i,c}_t = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  Y_t^{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} Y_t^{{\boldsymbol{\eta}}}   \end{align}
\begin{align}X^{i,c}_t = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  Y_t^{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} Y_t^{{\boldsymbol{\eta}}}   \end{align}
for all 
 $t\geq 0$
, assuming initially empty networked systems. We note that for all
$t\geq 0$
, assuming initially empty networked systems. We note that for all 
 ${\boldsymbol{\eta}}=(i,j,c)\in\mathcal{T}$
, it holds that
${\boldsymbol{\eta}}=(i,j,c)\in\mathcal{T}$
, it holds that 
 ${\boldsymbol{\eta}}\in\mathcal{T}_{j}^{\leftarrow}$
 and
${\boldsymbol{\eta}}\in\mathcal{T}_{j}^{\leftarrow}$
 and 
 ${\boldsymbol{\eta}}\in\mathcal{T}_{i}^{\rightarrow}$
. Thus, paths in X and Y differ in that the former is coupled, i.e., a job transition in the direction
${\boldsymbol{\eta}}\in\mathcal{T}_{i}^{\rightarrow}$
. Thus, paths in X and Y differ in that the former is coupled, i.e., a job transition in the direction 
 $\boldsymbol{\eta}=(i,j,c)$
 is relevant to (and thus is synchronized across) a pair of marginal processes
$\boldsymbol{\eta}=(i,j,c)$
 is relevant to (and thus is synchronized across) a pair of marginal processes 
 $(X^{i,c}_t)_{t\geq0},(X^{j,c}_t)_{t\geq0}$
; in the latter, this is only relevant to the indexed process
$(X^{i,c}_t)_{t\geq0},(X^{j,c}_t)_{t\geq0}$
; in the latter, this is only relevant to the indexed process 
 $(Y^{\boldsymbol{\eta}}_t)_{t\geq 0}$
.
$(Y^{\boldsymbol{\eta}}_t)_{t\geq 0}$
.
 In view of (7), we further define 
 $x_{i,c} = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} y_{{\boldsymbol{\eta}}} $
 as the class-c queue length in station
$x_{i,c} = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} y_{{\boldsymbol{\eta}}} $
 as the class-c queue length in station 
 $i>0$
 for any
$i>0$
 for any 
 $y\in\mathcal{S}^Y$
. Then the
$y\in\mathcal{S}^Y$
. Then the 
 $\tilde{\mathbb{P}}$
-associated infinitesimal generator matrix
$\tilde{\mathbb{P}}$
-associated infinitesimal generator matrix 
 $\Xi$
 of Y is such that
$\Xi$
 of Y is such that 
 $\textstyle\Xi_{y,y'}\equiv\textstyle\Xi_{y,{\boldsymbol{\eta}}} =\delta + \lambda_{{\boldsymbol{\eta}}}\cdot\big[\Upsilon(y,\eta_1,\eta_3)\vee 0\big]$
 with a station load
$\textstyle\Xi_{y,y'}\equiv\textstyle\Xi_{y,{\boldsymbol{\eta}}} =\delta + \lambda_{{\boldsymbol{\eta}}}\cdot\big[\Upsilon(y,\eta_1,\eta_3)\vee 0\big]$
 with a station load 
 \begin{align}\textstyle\Upsilon(y,i,c) = x_{i,c} \cdot \Bigg(\dfrac{K_i}{\sum_{c'\in\mathcal{C}}x_{i,c'}} \wedge 1 \Bigg) \end{align}
\begin{align}\textstyle\Upsilon(y,i,c) = x_{i,c} \cdot \Bigg(\dfrac{K_i}{\sum_{c'\in\mathcal{C}}x_{i,c'}} \wedge 1 \Bigg) \end{align}
for all jumps 
 $y\xrightarrow{{\boldsymbol{\eta}}} y'$
,
$y\xrightarrow{{\boldsymbol{\eta}}} y'$
, 
 ${\boldsymbol{\eta}}=(i,j,c)$
, where the origin station
${\boldsymbol{\eta}}=(i,j,c)$
, where the origin station 
 $i>0$
 has PS discipline (here we have set
$i>0$
 has PS discipline (here we have set 
 $0/0=0$
), and
$0/0=0$
), and 
 \begin{align}\textstyle\textstyle\Upsilon(y,i,c) = K_i \wedge x_{i,c}  \end{align}
\begin{align}\textstyle\textstyle\Upsilon(y,i,c) = K_i \wedge x_{i,c}  \end{align}
in stations 
 $i>0$
 with an FCFS policy within single-class networks. The station load
$i>0$
 with an FCFS policy within single-class networks. The station load 
 $\Upsilon(y,i,c)$
 differs from the queue length
$\Upsilon(y,i,c)$
 differs from the queue length 
 $x_{i,c}$
 in that it accounts for the weighting derived from the service discipline. We further have
$x_{i,c}$
 in that it accounts for the weighting derived from the service discipline. We further have 
 $\textstyle\Xi_{y,y'}=\delta+\lambda_{0,j,c}$
 for arrivals from virtual stations (in open networks) and
$\textstyle\Xi_{y,y'}=\delta+\lambda_{0,j,c}$
 for arrivals from virtual stations (in open networks) and 
 $\textstyle\Xi_{y,y'}=\delta+\lambda_{0,j,c} \cdot (N+\sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{0,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{0,c}} y_{{\boldsymbol{\eta}}})$
 for arrivals from delays, where N denotes the job population in a closed system. Finally,
$\textstyle\Xi_{y,y'}=\delta+\lambda_{0,j,c} \cdot (N+\sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{0,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{0,c}} y_{{\boldsymbol{\eta}}})$
 for arrivals from delays, where N denotes the job population in a closed system. Finally, 
 $\Xi_y \,{:}\,{\raise-1.5pt{=}}\, \Xi_{y,y} = - \sum_{y'\in\mathcal{S}^Y: y\neq y'} \Xi_{y,y'}$
.
$\Xi_y \,{:}\,{\raise-1.5pt{=}}\, \Xi_{y,y} = - \sum_{y'\in\mathcal{S}^Y: y\neq y'} \Xi_{y,y'}$
.
4.1. A variational decomposition
 The likelihood for observation events in (3) readily transfers to counts Y by means of (7); we thus may write 
 $ f_{O|Y_{t_k}}(\textbf{o}_k)\equiv f_{O|X_{t_k}}(\textbf{o}_k) $
. Under the measure
$ f_{O|Y_{t_k}}(\textbf{o}_k)\equiv f_{O|X_{t_k}}(\textbf{o}_k) $
. Under the measure 
 $\tilde{\mathbb{P}}$
, network states can have negative values; the likelihood is undefined in such instances. Now, note that piecewise
$\tilde{\mathbb{P}}$
, network states can have negative values; the likelihood is undefined in such instances. Now, note that piecewise 
 $\mathcal{S}^Y$
-valued trajectories also represent elements
$\mathcal{S}^Y$
-valued trajectories also represent elements 
 $(\textbf{t},\textbf{y})$
 in a space
$(\textbf{t},\textbf{y})$
 in a space 
 $(\mathcal{Y},\Sigma_{\mathcal{Y}})$
, similar to network paths in X. Let
$(\mathcal{Y},\Sigma_{\mathcal{Y}})$
, similar to network paths in X. Let 
 $f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K}$
 be a density function, with respect to some base measure
$f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K}$
 be a density function, with respect to some base measure 
 $\mu_\mathcal{Y}$
, where for all
$\mu_\mathcal{Y}$
, where for all 
 $B\in\Sigma_{\mathcal{Y}}$
,
$B\in\Sigma_{\mathcal{Y}}$
, 
 \begin{equation*}\mathbb{P}(Y^{-1}(B)|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K) = \int_B f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K} \,d\mu_\mathcal{Y}.\end{equation*}
\begin{equation*}\mathbb{P}(Y^{-1}(B)|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K) = \int_B f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K} \,d\mu_\mathcal{Y}.\end{equation*}
It may be shown by properties of conditional distributions that, conditioned on observations, Y is a non-homogeneous semi-Markov process with hazard functions
 \begin{equation}\Lambda_{y,y'}(t)=\Xi_{y,y'}\cdot\frac{\mathbb{P}(\bigcap_{k : t_k>t}O_k^{-1}(\textbf{o}_{\textbf{k}})|Y_t=y')}{\mathbb{P}(\bigcap_{k : t_k>t}O_k^{-1}(\textbf{o}_{\textbf{k}})|Y_t=y)} \end{equation}
\begin{equation}\Lambda_{y,y'}(t)=\Xi_{y,y'}\cdot\frac{\mathbb{P}(\bigcap_{k : t_k>t}O_k^{-1}(\textbf{o}_{\textbf{k}})|Y_t=y')}{\mathbb{P}(\bigcap_{k : t_k>t}O_k^{-1}(\textbf{o}_{\textbf{k}})|Y_t=y)} \end{equation}
for 
 $y'\neq y$
, and
$y'\neq y$
, and 
 $\Lambda_{y}(t)=-\sum_{y'\neq y}\Lambda_{y,y'}(t)$
, so that
$\Lambda_{y}(t)=-\sum_{y'\neq y}\Lambda_{y,y'}(t)$
, so that 
 \begin{align*}f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K}(\textbf{t},\textbf{y}) =  \pi(y_0)\, e^{\int_{t_I}^{T} \Lambda_{y_{I}}(u)du} \, \prod_{i=1}^I \Lambda_{y_{i-1},y_{i}}(t_i)\, e^{\int_{t_{i-1}}^{t_i} \Lambda_{y_{i-1}}(u)du}\, .\end{align*}
\begin{align*}f_{Y|\boldsymbol{\lambda},\textbf{o}_1,\dots,\textbf{o}_K}(\textbf{t},\textbf{y}) =  \pi(y_0)\, e^{\int_{t_I}^{T} \Lambda_{y_{I}}(u)du} \, \prod_{i=1}^I \Lambda_{y_{i-1},y_{i}}(t_i)\, e^{\int_{t_{i-1}}^{t_i} \Lambda_{y_{i-1}}(u)du}\, .\end{align*}
Here, 
 $\Xi\equiv \Xi(\delta,\boldsymbol{\lambda})$
 denotes the generator matrix associated with fixed values in
$\Xi\equiv \Xi(\delta,\boldsymbol{\lambda})$
 denotes the generator matrix associated with fixed values in 
 $\boldsymbol{\lambda}$
. For a deeper look at conditional jump processes we refer the reader to [Reference Serfozo25, Reference Daley and Vere-Jones9]. This conditional counting process is of key importance; however, the structure of rates in (10) poses a trivial analytical impediment. In our approximating effort, we assume the existence of an alternative measure
$\boldsymbol{\lambda}$
. For a deeper look at conditional jump processes we refer the reader to [Reference Serfozo25, Reference Daley and Vere-Jones9]. This conditional counting process is of key importance; however, the structure of rates in (10) poses a trivial analytical impediment. In our approximating effort, we assume the existence of an alternative measure 
 $\mathbb{Q}$
 on
$\mathbb{Q}$
 on 
 $(\Omega,\mathcal{F})$
. Under this measure, count trajectories in Y are subject to a full decomposition; that is, the
$(\Omega,\mathcal{F})$
. Under this measure, count trajectories in Y are subject to a full decomposition; that is, the 
 $\mathbb{Q}$
-law of Y is that of a family of
$\mathbb{Q}$
-law of Y is that of a family of 
 $|\mathcal{T}|$
 independent non-homogeneous Poisson counting processes with state-dependent intensity functions
$|\mathcal{T}|$
 independent non-homogeneous Poisson counting processes with state-dependent intensity functions 
 $\nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
, for all
$\nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
, for all 
 ${\boldsymbol{\eta}}\in\mathcal{T}$
. Here, the following hold:
${\boldsymbol{\eta}}\in\mathcal{T}$
. Here, the following hold:
- 
• Intensity rates for jumps  $y\xrightarrow{t,{\boldsymbol{\eta}}} y'$
, $y\xrightarrow{t,{\boldsymbol{\eta}}} y'$
, $t\geq 0$
, are independent of $t\geq 0$
, are independent of $\boldsymbol{\lambda}$
, change over time, and are given by $\boldsymbol{\lambda}$
, change over time, and are given by $\nu^{\boldsymbol{\eta}}_t(y_{\boldsymbol{\eta}})$
. $\nu^{\boldsymbol{\eta}}_t(y_{\boldsymbol{\eta}})$
.
- 
• Holding rates in Y evolve according to  $|\nu_t (Y_t)|$
, with $|\nu_t (Y_t)|$
, with $\nu_t (Y_t)=-\sum_{\boldsymbol{\eta}\in\mathcal{T}}\nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)$
. $\nu_t (Y_t)=-\sum_{\boldsymbol{\eta}\in\mathcal{T}}\nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)$
.
- 
• The state probability of the multivariate process Y factors across the job transition directions, so that for every \begin{equation*}\mathbb{Q}(Y_t=y) = \prod_{\boldsymbol{\eta}\in\mathcal{T}}\mathbb{Q}(Y^{\boldsymbol{\eta}}_t=y_{\boldsymbol{\eta}})\end{equation*} \begin{equation*}\mathbb{Q}(Y_t=y) = \prod_{\boldsymbol{\eta}\in\mathcal{T}}\mathbb{Q}(Y^{\boldsymbol{\eta}}_t=y_{\boldsymbol{\eta}})\end{equation*} $y\in\mathcal{S}^Y$
. $y\in\mathcal{S}^Y$
.
 Similar full-independence decompositions are often referred to as a variational mean-field approximations within Bayesian inferential settings [Reference Cohn, El-Hay, Friedman and Kupferman7]. In order to ensure computational tractability within forthcoming procedures, the intensity functions 
 $\nu$
 must be bounded from above by some arbitrary constant, so that
$\nu$
 must be bounded from above by some arbitrary constant, so that 
 $\nu^{\boldsymbol{\eta}}_t(y_{\boldsymbol{\eta}})\leq \bar{\nu}$
 for all
$\nu^{\boldsymbol{\eta}}_t(y_{\boldsymbol{\eta}})\leq \bar{\nu}$
 for all 
 $t>0,\boldsymbol{\eta}\in\mathcal{T}$
, and
$t>0,\boldsymbol{\eta}\in\mathcal{T}$
, and 
 $y\in\mathcal{S}^Y$
. We finally note that
$y\in\mathcal{S}^Y$
. We finally note that 
 $\mathbb{Q}$
 and
$\mathbb{Q}$
 and 
 $\tilde{\mathbb{P}}$
 are equivalent on
$\tilde{\mathbb{P}}$
 are equivalent on 
 $\mathcal{F}$
, as both assign a positive measure to every marginally-increasing sequence of
$\mathcal{F}$
, as both assign a positive measure to every marginally-increasing sequence of 
 $\mathbb{N}_0^{|\mathcal{T}|}$
-valued counts.
$\mathbb{N}_0^{|\mathcal{T}|}$
-valued counts.
4.2. Recapitulation and the variational lower bound
 It is important to observe that the intensity functions 
 $\nu^{\boldsymbol{\eta}}$
 completely define our alternative measure. In the rest of the paper, we inspect how to fine-tune these functions to ensure that
$\nu^{\boldsymbol{\eta}}$
 completely define our alternative measure. In the rest of the paper, we inspect how to fine-tune these functions to ensure that 
 $\mathbb{Q}$
 offers a good approximation to the measure
$\mathbb{Q}$
 offers a good approximation to the measure 
 $\mathbb{P}$
, conditioned on observations in
$\mathbb{P}$
, conditioned on observations in 
 $\textbf{O}$
. First, within the diagram in Figure 3 we offer an outline of the various measures, stochastic processes, probabilistic dependencies, and infinitesimal generators introduced up to this point. There, we also summarize the primary steps that help construct a variational framework to enable the inferential task. In short, these are as follows:
$\textbf{O}$
. First, within the diagram in Figure 3 we offer an outline of the various measures, stochastic processes, probabilistic dependencies, and infinitesimal generators introduced up to this point. There, we also summarize the primary steps that help construct a variational framework to enable the inferential task. In short, these are as follows:
- 
• The reference measure  $\mathbb{P}$
, along with the generator $\mathbb{P}$
, along with the generator $Q=Q(\boldsymbol{\lambda})$
, describes the real unknown stochastic network trajectories in X. $Q=Q(\boldsymbol{\lambda})$
, describes the real unknown stochastic network trajectories in X.
- 
•  $\tilde{\mathbb{P}}$
 is a minor departure from $\tilde{\mathbb{P}}$
 is a minor departure from $\mathbb{P}$
 and expands the support of X so that queue lengths can become negative. It is associated with the generators $\mathbb{P}$
 and expands the support of X so that queue lengths can become negative. It is associated with the generators $\tilde{Q}=\tilde{Q}(\delta,\boldsymbol{\lambda})$
 and $\tilde{Q}=\tilde{Q}(\delta,\boldsymbol{\lambda})$
 and $\Xi=\Xi(\delta,\boldsymbol{\lambda})$
, for trajectories in X and job transition counts in Y, respectively. $\Xi=\Xi(\delta,\boldsymbol{\lambda})$
, for trajectories in X and job transition counts in Y, respectively.
- 
•  $\mathbb{Q}$
 is an alternative measure under which Y is a set of independent non-homogeneous counting processes. It is associated with intensity functions $\mathbb{Q}$
 is an alternative measure under which Y is a set of independent non-homogeneous counting processes. It is associated with intensity functions $\nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 for each count process in the directions $\nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 for each count process in the directions $\boldsymbol{\eta}\in\mathcal{T}$
. $\boldsymbol{\eta}\in\mathcal{T}$
.

Figure 3: Overview of the approximating variational framework presented in this paper, summarizing the various measures used and the primary goals that each step will accomplish.
 In our next result, a variational lower bound will interrelate the transition rates 
 $\boldsymbol{\lambda}$
 for jumps under
$\boldsymbol{\lambda}$
 for jumps under 
 $\mathbb{P}$
 and
$\mathbb{P}$
 and 
 $\tilde{\mathbb{P}}$
 with the intensity functions
$\tilde{\mathbb{P}}$
 with the intensity functions 
 $\nu^{\boldsymbol{\eta}}$
, in light of the available observations/measurements in
$\nu^{\boldsymbol{\eta}}$
, in light of the available observations/measurements in 
 $\textbf{O}$
. From now on, transition rates
$\textbf{O}$
. From now on, transition rates 
 $\boldsymbol{\lambda}$
 are assumed mutually independent under
$\boldsymbol{\lambda}$
 are assumed mutually independent under 
 $\mathbb{Q}$
 and admit undetermined densities
$\mathbb{Q}$
 and admit undetermined densities 
 $d\mathbb{Q}_{\boldsymbol{\lambda}} = \{d\mathbb{Q}_{\lambda,\boldsymbol{\eta}}, \ \boldsymbol{\eta}\in\mathcal{T}\}$
 that must integrate to 1 on
$d\mathbb{Q}_{\boldsymbol{\lambda}} = \{d\mathbb{Q}_{\lambda,\boldsymbol{\eta}}, \ \boldsymbol{\eta}\in\mathcal{T}\}$
 that must integrate to 1 on 
 $(\mathbb{R}_+,\mathcal{B}(\mathbb{R}_+))$
.
$(\mathbb{R}_+,\mathcal{B}(\mathbb{R}_+))$
.
Lemma 1. (Variational lower bound.) Define 
 $\textbf{O}=\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)$
, and let
$\textbf{O}=\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)$
, and let 
 $\tilde{\mathbb{P}}$
 and
$\tilde{\mathbb{P}}$
 and 
 $\mathbb{Q}$
 be the probability measures on
$\mathbb{Q}$
 be the probability measures on 
 $(\Omega,\mathcal{F})$
 as defined above. Recall the notation
$(\Omega,\mathcal{F})$
 as defined above. Recall the notation 
 $\textstyle\Xi_{y,y'}\equiv\textstyle\Xi_{y,{\boldsymbol{\eta}}}$
 for jumps
$\textstyle\Xi_{y,y'}\equiv\textstyle\Xi_{y,{\boldsymbol{\eta}}}$
 for jumps 
 $y\xrightarrow{{\boldsymbol{\eta}}} y'$
 with direction
$y\xrightarrow{{\boldsymbol{\eta}}} y'$
 with direction 
 ${\boldsymbol{\eta}}$
. Then
${\boldsymbol{\eta}}$
. Then
 \begin{align}\log\tilde{\mathbb{P}}(\textbf{O}) &\geq \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k)  \big] -\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg] \nonumber \\[3pt]&\quad - \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\boldsymbol{\lambda}} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  - \Xi_{Y_t} + \nu_t(Y_t) \bigg] dt \end{align}
\begin{align}\log\tilde{\mathbb{P}}(\textbf{O}) &\geq \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k)  \big] -\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg] \nonumber \\[3pt]&\quad - \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\boldsymbol{\lambda}} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  - \Xi_{Y_t} + \nu_t(Y_t) \bigg] dt \end{align}
offers a lower bound on the 
 $\tilde{\mathbb{P}}$
-probability of retrieved observation events.
$\tilde{\mathbb{P}}$
-probability of retrieved observation events.
Proof. Note that
 \begin{align}\log\tilde{\mathbb{P}}(\textbf{O})&=\log\int_{\mathcal{Y}\times\mathbb{R}_+^{|\mathcal{T}|}} \mathbb{P}(\textbf{O}|Y)\, d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}} = \log \mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \mathbb{P}(\textbf{O}|Y)\, \frac{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}}{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}} \bigg] \nonumber \\[3pt] &\geq \mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log  \mathbb{P}(\textbf{O}|Y)\big]  - \mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg], \end{align}
\begin{align}\log\tilde{\mathbb{P}}(\textbf{O})&=\log\int_{\mathcal{Y}\times\mathbb{R}_+^{|\mathcal{T}|}} \mathbb{P}(\textbf{O}|Y)\, d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}} = \log \mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \mathbb{P}(\textbf{O}|Y)\, \frac{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}}{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}} \bigg] \nonumber \\[3pt] &\geq \mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log  \mathbb{P}(\textbf{O}|Y)\big]  - \mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg], \end{align}
where we use Jensen’s inequality for finite measures. This is known as a variational lower bound on the log-likelihood, and
 \begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log  \mathbb{P}(\textbf{O}|Y)\big] &= \mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log   \prod_{k=1}^K f_{O|Y_{t_k}}(\textbf{o}_k)  \big] = \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k)  \big]\end{align*}
\begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log  \mathbb{P}(\textbf{O}|Y)\big] &= \mathbb{E}^{\mathbb{Q}}_{Y} \big[ \log   \prod_{k=1}^K f_{O|Y_{t_k}}(\textbf{o}_k)  \big] = \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k)  \big]\end{align*}
follows directly from (3). The negative part in (12) is the Kullback–Leibler (KL) divergence between image measures of 
 $\mathbb{Q}$
 and
$\mathbb{Q}$
 and 
 $\tilde{\mathbb{P}}$
. By noting that these share base measures, and
$\tilde{\mathbb{P}}$
. By noting that these share base measures, and 
 $Y,\boldsymbol{\lambda}$
 are independent under
$Y,\boldsymbol{\lambda}$
 are independent under 
 $\mathbb{Q}$
, we have
$\mathbb{Q}$
, we have 
 \begin{align}\mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg] &= \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg] \, + \, \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[ \mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}}  \bigg] \bigg] , \end{align}
\begin{align}\mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg] &= \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg] \, + \, \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[ \mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}}  \bigg] \bigg] , \end{align}
where 
 $g_Y$
 and
$g_Y$
 and 
 $f_{Y|\boldsymbol{\lambda}}$
 denote the Y-trajectory densities associated with rates
$f_{Y|\boldsymbol{\lambda}}$
 denote the Y-trajectory densities associated with rates 
 $\nu^{\boldsymbol{\eta}}$
 and
$\nu^{\boldsymbol{\eta}}$
 and 
 $\Xi(\boldsymbol{\lambda})$
, respectively. The last term in (13) is a
$\Xi(\boldsymbol{\lambda})$
, respectively. The last term in (13) is a 
 $\mathbb{Q}$
-average of the KL divergence on Y, where the mean is taken across the infinitesimal transition rates. For a fixed starting
$\mathbb{Q}$
-average of the KL divergence on Y, where the mean is taken across the infinitesimal transition rates. For a fixed starting 
 $Y_0\in\mathcal{S}^Y$
, the inner expectation is shown in [Reference Opper and Sanguinetti18] to take the equivalent form
$Y_0\in\mathcal{S}^Y$
, the inner expectation is shown in [Reference Opper and Sanguinetti18] to take the equivalent form 
 \begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}} \bigg] &= \lim_{R\rightarrow\infty} \sum_{r=0}^{R-1}  \mathbb{E}^{\mathbb{Q}}_{Y_{\frac{Tr}{R}}} \Bigg[ \sum_{y\in\mathcal{S}^Y} \mathbb{Q}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}}) \log\frac{\mathbb{Q}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}})}{\tilde{\mathbb{P}}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}},\boldsymbol{\lambda})} \Bigg] .\end{align*}
\begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}} \bigg] &= \lim_{R\rightarrow\infty} \sum_{r=0}^{R-1}  \mathbb{E}^{\mathbb{Q}}_{Y_{\frac{Tr}{R}}} \Bigg[ \sum_{y\in\mathcal{S}^Y} \mathbb{Q}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}}) \log\frac{\mathbb{Q}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}})}{\tilde{\mathbb{P}}(Y_{\frac{T(r+1)}{R}}=y|Y_{\frac{Tr}{R}},\boldsymbol{\lambda})} \Bigg] .\end{align*}
Note that within an infinitesimal time interval a jump in Y may only happen in one direction within 
 $\mathcal{T}$
. With this in mind, we retrieve the limit of a Riemann sum in the interval [0, T], i.e.
$\mathcal{T}$
. With this in mind, we retrieve the limit of a Riemann sum in the interval [0, T], i.e. 
 \begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}} \bigg]  = &\lim_{R\rightarrow\infty} \frac{T}{R }\sum_{r=0}^{R-1}  \mathbb{E}^{\mathbb{Q}}_{Y_{\frac{Tr}{R}}} \Bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_{\frac{Tr}{R}}(Y^{\boldsymbol{\eta}}_{\frac{Tr}{R}}) \log \frac{ \nu^{\boldsymbol{\eta}}_{\frac{Tr}{R}}(Y^{\boldsymbol{\eta}}_{\frac{Tr}{R}})}{\Xi_{Y_{\frac{Tr}{R}},{\boldsymbol{\eta}}}} +  \frac{R}{T} \log\frac{1 + \frac{T}{R}  \nu_{\frac{Tr}{R}}(Y_{\frac{Tr}{R}})}{1+ \frac{T}{R}   \Xi_{Y_{\frac{Tr}{R}}}} \Bigg] \\[3pt]  = &\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  - \Xi_{Y_t} + \nu_t(Y_t) \bigg] dt ,\end{align*}
\begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y} \bigg[ \log  \frac{g_Y}{f_{Y|\boldsymbol{\lambda}}} \bigg]  = &\lim_{R\rightarrow\infty} \frac{T}{R }\sum_{r=0}^{R-1}  \mathbb{E}^{\mathbb{Q}}_{Y_{\frac{Tr}{R}}} \Bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_{\frac{Tr}{R}}(Y^{\boldsymbol{\eta}}_{\frac{Tr}{R}}) \log \frac{ \nu^{\boldsymbol{\eta}}_{\frac{Tr}{R}}(Y^{\boldsymbol{\eta}}_{\frac{Tr}{R}})}{\Xi_{Y_{\frac{Tr}{R}},{\boldsymbol{\eta}}}} +  \frac{R}{T} \log\frac{1 + \frac{T}{R}  \nu_{\frac{Tr}{R}}(Y_{\frac{Tr}{R}})}{1+ \frac{T}{R}   \Xi_{Y_{\frac{Tr}{R}}}} \Bigg] \\[3pt]  = &\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  - \Xi_{Y_t} + \nu_t(Y_t) \bigg] dt ,\end{align*}
and
 \begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg] = \ &\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg]\\[3pt] &+ \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\boldsymbol{\lambda}} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\tilde{\Xi}_{Y_t,{\boldsymbol{\eta}}}}  - \tilde{\Xi}_{Y_t} + \nu_t(Y_t) \bigg] dt\end{align*}
\begin{align*}\mathbb{E}^{\mathbb{Q}}_{Y,\boldsymbol{\lambda}} \bigg[ \log  \frac{d(Y,\boldsymbol{\lambda})_*\mathbb{Q}}{d(Y,\boldsymbol{\lambda})_*\tilde{\mathbb{P}}} \bigg] = \ &\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}} \bigg[  \log\frac{ d\mathbb{Q}_{\boldsymbol{\lambda}} }{d\mathbb{P}_{\boldsymbol{\lambda}}} \bigg]\\[3pt] &+ \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\boldsymbol{\lambda}} \bigg[  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\tilde{\Xi}_{Y_t,{\boldsymbol{\eta}}}}  - \tilde{\Xi}_{Y_t} + \nu_t(Y_t) \bigg] dt\end{align*}
completes the proof.
 
 Thus, the lower bound in (11) depends on both the latent variables Y and 
 $\boldsymbol{\lambda}$
, accounting for the various counts and rates. On a basic level, this is built from three distinct components: the expected log-observations, the KL divergence across service rate densities, and a
$\boldsymbol{\lambda}$
, accounting for the various counts and rates. On a basic level, this is built from three distinct components: the expected log-observations, the KL divergence across service rate densities, and a 
 $\mathbb{Q}$
-weighted divergence across hazard functions and rates, further integrated along the entire network trajectory.
$\mathbb{Q}$
-weighted divergence across hazard functions and rates, further integrated along the entire network trajectory.
5. A functional representation
 The above bound includes the prior rates density 
 $d\mathbb{P}_{\boldsymbol{\lambda}}$
 along with the
$d\mathbb{P}_{\boldsymbol{\lambda}}$
 along with the 
 $\tilde{\mathbb{P}}$
-generator
$\tilde{\mathbb{P}}$
-generator 
 $\Xi$
 for the network with negative queues. In addition, we find the unknown
$\Xi$
 for the network with negative queues. In addition, we find the unknown 
 $\mathbb{Q}$
-distribution for the count random variables
$\mathbb{Q}$
-distribution for the count random variables 
 $Y_t$
, along with the hazard functions
$Y_t$
, along with the hazard functions 
 $\nu$
. Hence, by maximizing this bound with respect to
$\nu$
. Hence, by maximizing this bound with respect to 
 $\nu$
, we may derive properties on
$\nu$
, we may derive properties on 
 $\mathbb{Q}$
 that allow for the construction of approximating distributions to
$\mathbb{Q}$
 that allow for the construction of approximating distributions to 
 $d\mathbb{P}_{X|\textbf{o}_1,\dots,\textbf{o}_K}$
 and
$d\mathbb{P}_{X|\textbf{o}_1,\dots,\textbf{o}_K}$
 and 
 $d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
. In this section, we begin by generalizing work in [Reference Opper and Sanguinetti18] and present results that (i) accommodate parameter uncertainty in transition rates and (ii) impose computational restrictions in the resulting iterative system of equations. Later, we move on to inspect posterior rate densities and conjugacy properties as
$d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
. In this section, we begin by generalizing work in [Reference Opper and Sanguinetti18] and present results that (i) accommodate parameter uncertainty in transition rates and (ii) impose computational restrictions in the resulting iterative system of equations. Later, we move on to inspect posterior rate densities and conjugacy properties as 
 $\delta\rightarrow 0$
.
$\delta\rightarrow 0$
.
Proposition 1. Let 
 $d\mathbb{Q}_{\boldsymbol{\lambda}}$
 be some valid joint density assigned to the instantaneous rates
$d\mathbb{Q}_{\boldsymbol{\lambda}}$
 be some valid joint density assigned to the instantaneous rates 
 $\boldsymbol{\lambda}$
 under the approximating variational measure
$\boldsymbol{\lambda}$
 under the approximating variational measure 
 $\mathbb{Q}$
. Also, define
$\mathbb{Q}$
. Also, define 
 $Y^{\backslash\boldsymbol{\eta}}_t = \{ Y^{\boldsymbol{\eta}'}_t  \, :\, \boldsymbol{\eta}'\in\mathcal{T} \backslash \{\boldsymbol{\eta}\}\}$
. Then the
$Y^{\backslash\boldsymbol{\eta}}_t = \{ Y^{\boldsymbol{\eta}'}_t  \, :\, \boldsymbol{\eta}'\in\mathcal{T} \backslash \{\boldsymbol{\eta}\}\}$
. Then the 
 $\mathbb{Q}$
-dynamics of Y that optimize the lower bound (11) may be parametrized by a system of equations, so that the intensity functions
$\mathbb{Q}$
-dynamics of Y that optimize the lower bound (11) may be parametrized by a system of equations, so that the intensity functions 
 $\nu_t^{\boldsymbol{\eta}}(y)\leq\bar{\nu}$
 are given by
$\nu_t^{\boldsymbol{\eta}}(y)\leq\bar{\nu}$
 are given by 
 \begin{align}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) }  \end{align}
\begin{align}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) }  \end{align}
for all 
 $t\in[0,T]$
,
$t\in[0,T]$
, 
 $\boldsymbol{\eta}\in\mathcal{T}$
, and
$\boldsymbol{\eta}\in\mathcal{T}$
, and 
 $y\in\mathbb{N}_0$
, with
$y\in\mathbb{N}_0$
, with 
 $\kappa_t^{\boldsymbol{\eta}}(y)\geq 0$
 and
$\kappa_t^{\boldsymbol{\eta}}(y)\geq 0$
 and 
 \begin{align}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} = \ &r_t^{\boldsymbol{\eta}}(y)  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] \nonumber\\&- \bigg(1+\frac{k_t^{\boldsymbol{\eta}}(y)}{\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}\bigg)  r_t^{\boldsymbol{\eta}}(y+1) \frac{e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}}{e^{ k_t^{\boldsymbol{\eta}}(y)/\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}} \end{align}
\begin{align}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} = \ &r_t^{\boldsymbol{\eta}}(y)  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] \nonumber\\&- \bigg(1+\frac{k_t^{\boldsymbol{\eta}}(y)}{\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}\bigg)  r_t^{\boldsymbol{\eta}}(y+1) \frac{e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}}{e^{ k_t^{\boldsymbol{\eta}}(y)/\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}} \end{align}
whenever 
 $t\neq t_k$
,
$t\neq t_k$
, 
 $k=1,\dots,K$
, and
$k=1,\dots,K$
, and 
 \begin{align}\lim_{t\rightarrow t^-_k} r_t^{\boldsymbol{\eta}}(y) = r_{t_k}^{\boldsymbol{\eta}}(y) \exp \left( \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k) | Y_{t_k}^{\boldsymbol{\eta}}=y \big] \right) \end{align}
\begin{align}\lim_{t\rightarrow t^-_k} r_t^{\boldsymbol{\eta}}(y) = r_{t_k}^{\boldsymbol{\eta}}(y) \exp \left( \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k) | Y_{t_k}^{\boldsymbol{\eta}}=y \big] \right) \end{align}
at network observation times. In addition, 
 $\kappa_t^{\boldsymbol{\eta}}(y)(\nu_t^{\boldsymbol{\eta}}(y)-\bar{\nu})=0$
.
$\kappa_t^{\boldsymbol{\eta}}(y)(\nu_t^{\boldsymbol{\eta}}(y)-\bar{\nu})=0$
.
 
Proof. We identify a stationary point to the Lagrangian associated with this constrained optimization problem, where optimization is with respect to the jump rates and the finite-dimensional distributions of Y, subject to 
 $\nu_t^{\boldsymbol{\eta}}(y)\leq \bar{\nu}$
 and the master equation
$\nu_t^{\boldsymbol{\eta}}(y)\leq \bar{\nu}$
 and the master equation 
 \begin{align}\frac{d\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}{dt} =\nu_t^{\boldsymbol{\eta}}(y-1)\cdot\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y-1) - \nu_t^{\boldsymbol{\eta}}(y)\cdot \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) \end{align}
\begin{align}\frac{d\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)}{dt} =\nu_t^{\boldsymbol{\eta}}(y-1)\cdot\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y-1) - \nu_t^{\boldsymbol{\eta}}(y)\cdot \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) \end{align}
for 
 $y\geq 1$
, with
$y\geq 1$
, with 
 $d\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=0)= - \nu_t^{\boldsymbol{\eta}}(0) \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=0)dt$
. Denote by
$d\mathbb{Q}(Y_t^{\boldsymbol{\eta}}=0)= - \nu_t^{\boldsymbol{\eta}}(0) \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=0)dt$
. Denote by 
 $\phi_t^{\boldsymbol{\eta}}(y) = \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)$
 the functional representing the marginal
$\phi_t^{\boldsymbol{\eta}}(y) = \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y)$
 the functional representing the marginal 
 $\mathbb{Q}$
-probability of the state
$\mathbb{Q}$
-probability of the state 
 $Y_t$
 in the direction of
$Y_t$
 in the direction of 
 $\boldsymbol{\eta}$
, for all
$\boldsymbol{\eta}$
, for all 
 $y\in\mathbb{N}_0$
. In view of (11), the object function may be expressed as the functional
$y\in\mathbb{N}_0$
. In view of (11), the object function may be expressed as the functional 
 \begin{align*}\Phi[\phi,\nu,l]= \ &C + \sum_{k=1}^K \mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k) \big]\\&-  \int_0^T  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} \big[ \Psi[Y^{\boldsymbol{\eta}}_t,\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) , \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}),l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})] \big] dt\end{align*}
\begin{align*}\Phi[\phi,\nu,l]= \ &C + \sum_{k=1}^K \mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  f_{O|Y_{t_k}}(\textbf{o}_k) \big]\\&-  \int_0^T  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} \big[ \Psi[Y^{\boldsymbol{\eta}}_t,\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) , \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}),l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})] \big] dt\end{align*}
with
 \begin{align*}\Psi[Y^{\boldsymbol{\eta}}_t,\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) , \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}),l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})] &= \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) \Big( \log  \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} |  Y^{\boldsymbol{\eta}}_t  \big]  - 1\Big) \\&\quad + \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}}  |  Y^{\boldsymbol{\eta}}_t \big]  - l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) \Big(\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) + \frac{d\log \phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})}{dt} \Big) \\&\quad + l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}+1)\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) - \frac{k_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})}{\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})} \big(\bar{\nu}-\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})\big)   ,\end{align*}
\begin{align*}\Psi[Y^{\boldsymbol{\eta}}_t,\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) , \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}),l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})] &= \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) \Big( \log  \nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} |  Y^{\boldsymbol{\eta}}_t  \big]  - 1\Big) \\&\quad + \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}}  |  Y^{\boldsymbol{\eta}}_t \big]  - l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) \Big(\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) + \frac{d\log \phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})}{dt} \Big) \\&\quad + l_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}+1)\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}}) - \frac{k_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})}{\phi_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})} \big(\bar{\nu}-\nu_t^{\boldsymbol{\eta}}(Y_t^{\boldsymbol{\eta}})\big)   ,\end{align*}
where 
 $l^{\boldsymbol{\eta}} = (l^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 and
$l^{\boldsymbol{\eta}} = (l^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 and 
 $k^{\boldsymbol{\eta}} = (k^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 are multiplier functions that ensure that (17) and the complementary inequality on rates are satisfied. Above, the term C includes the remainder terms in the lower bound in (11) that are independent of the finite-dimensional distributions of Y under
$k^{\boldsymbol{\eta}} = (k^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 are multiplier functions that ensure that (17) and the complementary inequality on rates are satisfied. Above, the term C includes the remainder terms in the lower bound in (11) that are independent of the finite-dimensional distributions of Y under 
 $\mathbb{Q}$
. Hence, we obtain the functional derivatives
$\mathbb{Q}$
. Hence, we obtain the functional derivatives 
 \begin{align*}\frac{\delta\Phi}{\delta\phi_t^{\boldsymbol{\eta}}(y)} &= \sum_{k=1}^K \delta(t-t_k) \,\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|Y_t}(\textbf{o}_k) | Y_t^{\boldsymbol{\eta}}=y \big]  - \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \frac{d l_t^{\boldsymbol{\eta}}(y)}{dt}  \\&\quad - \nu_t^{\boldsymbol{\eta}}(y) \Big( \log  \nu_t^{\boldsymbol{\eta}}(y) -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} |  Y^{\boldsymbol{\eta}}_t = y \big]  - 1 -  l_t^{\boldsymbol{\eta}}(y) +  l_t^{\boldsymbol{\eta}}(y+1) \Big) \ \end{align*}
\begin{align*}\frac{\delta\Phi}{\delta\phi_t^{\boldsymbol{\eta}}(y)} &= \sum_{k=1}^K \delta(t-t_k) \,\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|Y_t}(\textbf{o}_k) | Y_t^{\boldsymbol{\eta}}=y \big]  - \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \frac{d l_t^{\boldsymbol{\eta}}(y)}{dt}  \\&\quad - \nu_t^{\boldsymbol{\eta}}(y) \Big( \log  \nu_t^{\boldsymbol{\eta}}(y) -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} |  Y^{\boldsymbol{\eta}}_t = y \big]  - 1 -  l_t^{\boldsymbol{\eta}}(y) +  l_t^{\boldsymbol{\eta}}(y+1) \Big) \ \end{align*}
and
 \begin{align*}\frac{\delta\Phi}{\delta\nu_t^{\boldsymbol{\eta}}(y)} &= - \phi_t^{\boldsymbol{\eta}}(y) \big( \log  \nu_t^{\boldsymbol{\eta}}(y)  -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] + l_t^{\boldsymbol{\eta}}(y+1) - l_t^{\boldsymbol{\eta}}(y) \big) - k_t^{\boldsymbol{\eta}}(y) \ ,\end{align*}
\begin{align*}\frac{\delta\Phi}{\delta\nu_t^{\boldsymbol{\eta}}(y)} &= - \phi_t^{\boldsymbol{\eta}}(y) \big( \log  \nu_t^{\boldsymbol{\eta}}(y)  -  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] + l_t^{\boldsymbol{\eta}}(y+1) - l_t^{\boldsymbol{\eta}}(y) \big) - k_t^{\boldsymbol{\eta}}(y) \ ,\end{align*}
for all 
 $t\in[0,T]$
,
$t\in[0,T]$
, 
 $\boldsymbol{\eta}\in\mathcal{T}$
, and
$\boldsymbol{\eta}\in\mathcal{T}$
, and 
 $y\in\mathbb{N}_0$
, to be complemented by the slackness conditions
$y\in\mathbb{N}_0$
, to be complemented by the slackness conditions 
 $\kappa_t^{\boldsymbol{\eta}}(y)(\nu_t^{\boldsymbol{\eta}}(y)-\bar{\nu})=0$
,
$\kappa_t^{\boldsymbol{\eta}}(y)(\nu_t^{\boldsymbol{\eta}}(y)-\bar{\nu})=0$
, 
 $\kappa_t^{\boldsymbol{\eta}}(y)\geq 0$
, and
$\kappa_t^{\boldsymbol{\eta}}(y)\geq 0$
, and 
 $\nu_t^{\boldsymbol{\eta}}(y)\leq\bar{\nu}$
. By letting
$\nu_t^{\boldsymbol{\eta}}(y)\leq\bar{\nu}$
. By letting 
 $l_t^{\boldsymbol{\eta}}(y)=-\log r_t^{\boldsymbol{\eta}}(y)$
 and setting the above expressions to 0, we obtain
$l_t^{\boldsymbol{\eta}}(y)=-\log r_t^{\boldsymbol{\eta}}(y)$
 and setting the above expressions to 0, we obtain 
 \begin{align*}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} &= r_t^{\boldsymbol{\eta}}(y) \cdot \bigg( \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] -  \sum_{k=1}^K \delta(t-t_k) \,\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|Y_t}(\textbf{o}_k) | Y_t^{\boldsymbol{\eta}}=y \big] \bigg) \\&\quad - \bigg(1+\frac{k_t^{\boldsymbol{\eta}}(y)}{\phi_t^{\boldsymbol{\eta}}(y)}\bigg) \cdot r_t^{\boldsymbol{\eta}}(y+1) \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\phi_t^{\boldsymbol{\eta}}(y) }\end{align*}
\begin{align*}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} &= r_t^{\boldsymbol{\eta}}(y) \cdot \bigg( \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] -  \sum_{k=1}^K \delta(t-t_k) \,\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|Y_t}(\textbf{o}_k) | Y_t^{\boldsymbol{\eta}}=y \big] \bigg) \\&\quad - \bigg(1+\frac{k_t^{\boldsymbol{\eta}}(y)}{\phi_t^{\boldsymbol{\eta}}(y)}\bigg) \cdot r_t^{\boldsymbol{\eta}}(y+1) \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\phi_t^{\boldsymbol{\eta}}(y) }\end{align*}
and
 \begin{align*}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\phi_t^{\boldsymbol{\eta}}(y) } \ .\end{align*}
\begin{align*}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - k_t^{\boldsymbol{\eta}}(y)/\phi_t^{\boldsymbol{\eta}}(y) } \ .\end{align*}
Observe above that, for fixed values of 
 $\phi$
 and r, if
$\phi$
 and r, if 
 \begin{equation*}\frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}<\bar{\nu},\end{equation*}
\begin{equation*}\frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}<\bar{\nu},\end{equation*}
then the complementary slackness conditions imply 
 $k_t^{\boldsymbol{\eta}}(y)=0$
; otherwise,
$k_t^{\boldsymbol{\eta}}(y)=0$
; otherwise, 
 $\nu_t^{\boldsymbol{\eta}}(y)=\bar{\nu}$
 and
$\nu_t^{\boldsymbol{\eta}}(y)=\bar{\nu}$
 and 
 \begin{equation*}k_t^{\boldsymbol{\eta}}(y) = \phi_t^{\boldsymbol{\eta}}(y) \cdot \big[ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \log\frac{\bar{\nu} \cdot r_t^{\boldsymbol{\eta}}(y)}{r_t^{\boldsymbol{\eta}}(y+1)} \big] \geq 0,\end{equation*}
\begin{equation*}k_t^{\boldsymbol{\eta}}(y) = \phi_t^{\boldsymbol{\eta}}(y) \cdot \big[ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \log\frac{\bar{\nu} \cdot r_t^{\boldsymbol{\eta}}(y)}{r_t^{\boldsymbol{\eta}}(y+1)} \big] \geq 0,\end{equation*}
yielding a valid system of equations, which leads to (14)–(16) and concludes the proof.
 
Corollary 1. (Distributed network monitoring.) Assume that network observations are distributed and independent across the stations, so that
 \begin{align*}f_{O|Y_{t_k}}(\textbf{o}_k) = \prod_{i=1}^M f_{O|\{ Y^{\boldsymbol{\eta}}_{t_k}  \, :\, \boldsymbol{\eta}\in\mathcal{T}_i  \}}(\textbf{o}_k^i)\end{align*}
\begin{align*}f_{O|Y_{t_k}}(\textbf{o}_k) = \prod_{i=1}^M f_{O|\{ Y^{\boldsymbol{\eta}}_{t_k}  \, :\, \boldsymbol{\eta}\in\mathcal{T}_i  \}}(\textbf{o}_k^i)\end{align*}
for some conditional mass function
 \begin{equation*}f_{O|\{ Y^{\boldsymbol{\eta}}_{t_k}  \, :\, \boldsymbol{\eta}\in\mathcal{T}_i  \}},\end{equation*}
\begin{equation*}f_{O|\{ Y^{\boldsymbol{\eta}}_{t_k}  \, :\, \boldsymbol{\eta}\in\mathcal{T}_i  \}},\end{equation*}
where 
 $\mathcal{T}_i=(\cup_{c\in\mathcal{C}}\mathcal{T}_{i,c}^{\leftarrow}) \, \cup  \, (\cup_{c\in\mathcal{C}}\mathcal{T}_{i,c}^{\rightarrow})$
 is the set of job transitions relevant to network activity in station
$\mathcal{T}_i=(\cup_{c\in\mathcal{C}}\mathcal{T}_{i,c}^{\leftarrow}) \, \cup  \, (\cup_{c\in\mathcal{C}}\mathcal{T}_{i,c}^{\rightarrow})$
 is the set of job transitions relevant to network activity in station 
 $i>0$
, and
$i>0$
, and 
 $\textbf{o}^i_k$
 denotes the time-
$\textbf{o}^i_k$
 denotes the time-
 $t_k$
 observations across classes in the station. Further assume that
$t_k$
 observations across classes in the station. Further assume that 
 $\bar{\nu}=\infty$
, so that there exists no bound on intensity rates
$\bar{\nu}=\infty$
, so that there exists no bound on intensity rates 
 $\nu_t^{\boldsymbol{\eta}}(y)$
 under
$\nu_t^{\boldsymbol{\eta}}(y)$
 under 
 $\mathbb{Q}$
. Then the system of equations in Proposition 1 reduces to
$\mathbb{Q}$
. Then the system of equations in Proposition 1 reduces to 
 \begin{align*}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] }\end{align*}
\begin{align*}\nu_t^{\boldsymbol{\eta}}(y) =  \frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] }\end{align*}
for all 
 $t\in[0,T]$
,
$t\in[0,T]$
, 
 $\boldsymbol{\eta}\in\mathcal{T}$
, and
$\boldsymbol{\eta}\in\mathcal{T}$
, and 
 $y\in\mathbb{N}_0$
, with
$y\in\mathbb{N}_0$
, with 
 \begin{align*}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} = r_t^{\boldsymbol{\eta}}(y)  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - r_t^{\boldsymbol{\eta}}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}\end{align*}
\begin{align*}\frac{dr_t^{\boldsymbol{\eta}}(y)}{dt} = r_t^{\boldsymbol{\eta}}(y)  \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - r_t^{\boldsymbol{\eta}}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}\end{align*}
whenever 
 $t\neq t_k$
,
$t\neq t_k$
, 
 $k=1,\dots,K$
, and
$k=1,\dots,K$
, and 
 \begin{align*}\lim_{t\rightarrow t^-_k} r_t^{\boldsymbol{\eta}}(y) = \ &r_{t_k}^{\boldsymbol{\eta}}(y) e^{ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|\{ Y^{\boldsymbol{\eta}'}_{t}  \, :\, \boldsymbol{\eta}'\in\mathcal{T}_{\eta_1}  \}}(\textbf{o}_k^{\eta_1}) | Y_t^{\boldsymbol{\eta}}=y \big]}\\&\cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|\{ Y^{\boldsymbol{\eta}'}_{t}  \, :\, \boldsymbol{\eta}'\in\mathcal{T}_{\eta_2}  \}}(\textbf{o}_k^{\eta_2}) | Y_t^{\boldsymbol{\eta}}=y \big]}\end{align*}
\begin{align*}\lim_{t\rightarrow t^-_k} r_t^{\boldsymbol{\eta}}(y) = \ &r_{t_k}^{\boldsymbol{\eta}}(y) e^{ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|\{ Y^{\boldsymbol{\eta}'}_{t}  \, :\, \boldsymbol{\eta}'\in\mathcal{T}_{\eta_1}  \}}(\textbf{o}_k^{\eta_1}) | Y_t^{\boldsymbol{\eta}}=y \big]}\\&\cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t} \big[ \log  f_{O|\{ Y^{\boldsymbol{\eta}'}_{t}  \, :\, \boldsymbol{\eta}'\in\mathcal{T}_{\eta_2}  \}}(\textbf{o}_k^{\eta_2}) | Y_t^{\boldsymbol{\eta}}=y \big]}\end{align*}
accounting for observations at origin and departure stations in 
 $\boldsymbol{\eta}\in\mathcal{T}$
.
$\boldsymbol{\eta}\in\mathcal{T}$
.
 In Proposition 1, we derived a system of equations with iterated dependencies, given a distribution 
 $\mathbb{Q}_{\boldsymbol{\lambda}}$
 for the service rates. These equations can reconstruct the hazard rates
$\mathbb{Q}_{\boldsymbol{\lambda}}$
 for the service rates. These equations can reconstruct the hazard rates 
 $\nu = \nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 (for each counting process) that best approximate the trajectories of Y under
$\nu = \nu^{\boldsymbol{\eta}} = (\nu^{\boldsymbol{\eta}}_t(\cdot))_{t\geq 0}$
 (for each counting process) that best approximate the trajectories of Y under 
 $\tilde{\mathbb{P}}$
, given the observations
$\tilde{\mathbb{P}}$
, given the observations 
 $\textbf{O}$
. Probabilities over the states of Y may ultimately be derived by means of the master equation (17). In Corollary 1, we further notice that by simplifying the network observation model and easing restrictions on rates under the approximating measure
$\textbf{O}$
. Probabilities over the states of Y may ultimately be derived by means of the master equation (17). In Corollary 1, we further notice that by simplifying the network observation model and easing restrictions on rates under the approximating measure 
 $\mathbb{Q}$
, we retrieve a result analogous to that previously presented in [Reference Opper and Sanguinetti18, Reference Cohn, El-Hay, Friedman and Kupferman7]. However, this is reportedly problematic and can cause a computational bottleneck when reconstructing the jump rates
$\mathbb{Q}$
, we retrieve a result analogous to that previously presented in [Reference Opper and Sanguinetti18, Reference Cohn, El-Hay, Friedman and Kupferman7]. However, this is reportedly problematic and can cause a computational bottleneck when reconstructing the jump rates 
 $\nu_t^{\boldsymbol{\eta}}(y)$
, as these may approach infinity at observation times. When presenting our experimental results at the end of the paper, we will rely on the formulae within Proposition 1.
$\nu_t^{\boldsymbol{\eta}}(y)$
, as these may approach infinity at observation times. When presenting our experimental results at the end of the paper, we will rely on the formulae within Proposition 1.
 Next, we wish to determine the optimal form for 
 $\mathbb{Q}_{\boldsymbol{\lambda}}$
, given a family of hazard rates
$\mathbb{Q}_{\boldsymbol{\lambda}}$
, given a family of hazard rates 
 $\nu$
, such that we best approximate the conditional densities
$\nu$
, such that we best approximate the conditional densities 
 $d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
.
$d\mathbb{P}_{\boldsymbol{\lambda}|\textbf{o}_1,\dots,\textbf{o}_K}$
.
Proposition 2. Let densities for the infinitesimal rates 
 $\boldsymbol{\lambda}$
 be defined with respect to the (Lebesgue) product base measure
$\boldsymbol{\lambda}$
 be defined with respect to the (Lebesgue) product base measure 
 $\mu_{\boldsymbol{\lambda}}$
, so that
$\mu_{\boldsymbol{\lambda}}$
, so that 
 $d\mathbb{Q}_{\boldsymbol{\lambda}}=g_{\boldsymbol{\lambda}} d\mu_{\boldsymbol{\lambda}}$
 with
$d\mathbb{Q}_{\boldsymbol{\lambda}}=g_{\boldsymbol{\lambda}} d\mu_{\boldsymbol{\lambda}}$
 with 
 $ g_{\boldsymbol{\lambda}} = \prod_{\boldsymbol{\eta}\in\mathcal{T}} g^{\boldsymbol{\eta}}_\lambda$
 and marginal densities
$ g_{\boldsymbol{\lambda}} = \prod_{\boldsymbol{\eta}\in\mathcal{T}} g^{\boldsymbol{\eta}}_\lambda$
 and marginal densities 
 $g^{\boldsymbol{\eta}}_{\lambda} = d\mathbb{Q}_{\lambda,\boldsymbol{\eta}}/d\mu_{\lambda}$
. Also, let
$g^{\boldsymbol{\eta}}_{\lambda} = d\mathbb{Q}_{\lambda,\boldsymbol{\eta}}/d\mu_{\lambda}$
. Also, let 
 $\nu_t^{\boldsymbol{\eta}}(y)$
,
$\nu_t^{\boldsymbol{\eta}}(y)$
, 
 $\boldsymbol{\eta}\in\mathcal{T}$
, be some (independent) intensity functions assigned to Y under the approximating variational measure
$\boldsymbol{\eta}\in\mathcal{T}$
, be some (independent) intensity functions assigned to Y under the approximating variational measure 
 $\mathbb{Q}$
. Finally, define
$\mathbb{Q}$
. Finally, define 
 $\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}=\{\lambda_{\boldsymbol{\eta}'} \, :\, \boldsymbol{\eta}'\in\mathcal{T} \backslash \{\boldsymbol{\eta}\}  \}$
 and recall the definitions for network station loads
$\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}=\{\lambda_{\boldsymbol{\eta}'} \, :\, \boldsymbol{\eta}'\in\mathcal{T} \backslash \{\boldsymbol{\eta}\}  \}$
 and recall the definitions for network station loads 
 $\Upsilon$
 in (8) and (9). Then, as
$\Upsilon$
 in (8) and (9). Then, as 
 $\delta \rightarrow 0$
 in (6), the distribution
$\delta \rightarrow 0$
 in (6), the distribution 
 $\mathbb{Q}_{\boldsymbol{\lambda}}$
 that optimizes the lower bound (11) is such that
$\mathbb{Q}_{\boldsymbol{\lambda}}$
 that optimizes the lower bound (11) is such that 
 \begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] -z \cdot \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt} \cdot z^{\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt}\end{align*}
\begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] -z \cdot \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt} \cdot z^{\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt}\end{align*}
up to a normalizing constant, for 
 $z\in\mathbb{R}_+$
 and every
$z\in\mathbb{R}_+$
 and every 
 $\boldsymbol{\eta}\in\mathcal{T}$
.
$\boldsymbol{\eta}\in\mathcal{T}$
.
 
Proof. We again identify a stationary point to the Lagrangian associated with a constrained optimization problem, with respect to arbitrary (positive) densities 
 $g^{\boldsymbol{\eta}}_{\lambda}$
 with
$g^{\boldsymbol{\eta}}_{\lambda}$
 with 
 $ \int_{\mathbb{R}_+} g^{\boldsymbol{\eta}}_\lambda d\mu_{\lambda} =  1 $
. Since
$ \int_{\mathbb{R}_+} g^{\boldsymbol{\eta}}_\lambda d\mu_{\lambda} =  1 $
. Since 
 $\mathbb{P}_{\boldsymbol{\lambda}}$
 and
$\mathbb{P}_{\boldsymbol{\lambda}}$
 and 
 $\mathbb{Q}_{\boldsymbol{\lambda}}$
 share base measures, the object function can be written as
$\mathbb{Q}_{\boldsymbol{\lambda}}$
 share base measures, the object function can be written as 
 \begin{align*}\Phi[g]&= C - \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \mathbb{E}^{\mathbb{Q}}_{\lambda_{\boldsymbol{\eta}}} \big[ \Psi[\lambda_{\boldsymbol{\eta}},g^{\boldsymbol{\eta}}_\lambda] \big] -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  l^{\boldsymbol{\eta}} \Big[ \int_{\mathbb{R}_+} g^{\boldsymbol{\eta}}_\lambda d\mu_{\lambda} -  1\Big],\end{align*}
\begin{align*}\Phi[g]&= C - \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  \mathbb{E}^{\mathbb{Q}}_{\lambda_{\boldsymbol{\eta}}} \big[ \Psi[\lambda_{\boldsymbol{\eta}},g^{\boldsymbol{\eta}}_\lambda] \big] -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}}  l^{\boldsymbol{\eta}} \Big[ \int_{\mathbb{R}_+} g^{\boldsymbol{\eta}}_\lambda d\mu_{\lambda} -  1\Big],\end{align*}
where 
 $\{l^{\boldsymbol{\eta}}\}_{\boldsymbol{\eta}\in\mathcal{T}}$
 are non-functional Lagrange multipliers, and
$\{l^{\boldsymbol{\eta}}\}_{\boldsymbol{\eta}\in\mathcal{T}}$
 are non-functional Lagrange multipliers, and 
 \begin{align*}\Psi[\lambda_{\boldsymbol{\eta}},g^{\boldsymbol{\eta}}_\lambda] &=  \log g_{\lambda}^{\boldsymbol{\eta}}  -   \frac{1}{|\mathcal{T}|}  \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  +   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  + \Xi_{Y_t,\boldsymbol{\eta}} - \nu^{\boldsymbol{\eta}}_t(Y_t) \bigg] dt .\end{align*}
\begin{align*}\Psi[\lambda_{\boldsymbol{\eta}},g^{\boldsymbol{\eta}}_\lambda] &=  \log g_{\lambda}^{\boldsymbol{\eta}}  -   \frac{1}{|\mathcal{T}|}  \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  +   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  + \Xi_{Y_t,\boldsymbol{\eta}} - \nu^{\boldsymbol{\eta}}_t(Y_t) \bigg] dt .\end{align*}
The term C includes the remainder terms in the lower bound in (11) that are independent of the rates 
 $\boldsymbol{\lambda}$
. It follows that
$\boldsymbol{\lambda}$
. It follows that 
 \begin{align*}\frac{\delta\Phi}{\delta g^{\boldsymbol{\eta}}_\lambda} = \ & \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  -\log(g_{\lambda}^{\boldsymbol{\eta}}) - 1 - l^{\boldsymbol{\eta}}\\&-   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  + \Xi_{Y_t,\boldsymbol{\eta}} - \nu^{\boldsymbol{\eta}}_t(Y_t) \bigg] dt\end{align*}
\begin{align*}\frac{\delta\Phi}{\delta g^{\boldsymbol{\eta}}_\lambda} = \ & \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  -\log(g_{\lambda}^{\boldsymbol{\eta}}) - 1 - l^{\boldsymbol{\eta}}\\&-   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[  \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \frac{ \nu^{\boldsymbol{\eta}}_t(Y^{\boldsymbol{\eta}}_t)}{\Xi_{Y_t,{\boldsymbol{\eta}}}}  + \Xi_{Y_t,\boldsymbol{\eta}} - \nu^{\boldsymbol{\eta}}_t(Y_t) \bigg] dt\end{align*}
in its support set 
 $\mathbb{R}_+$
, for all
$\mathbb{R}_+$
, for all 
 $\boldsymbol{\eta}\in\mathcal{T}$
. By equating the above to 0, considering constraints, and analysing the relevant terms up to proportionality, we find that
$\boldsymbol{\eta}\in\mathcal{T}$
. By equating the above to 0, considering constraints, and analysing the relevant terms up to proportionality, we find that 
 \begin{align*}g_{\lambda}^{\boldsymbol{\eta}} \propto \exp\left( \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  +   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \Xi_{Y_t,{\boldsymbol{\eta}}}  - \Xi_{Y_t,\boldsymbol{\eta}} \bigg] dt \right) ,\end{align*}
\begin{align*}g_{\lambda}^{\boldsymbol{\eta}} \propto \exp\left( \mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}} ]  +   \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} \bigg[   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log \Xi_{Y_t,{\boldsymbol{\eta}}}  - \Xi_{Y_t,\boldsymbol{\eta}} \bigg] dt \right) ,\end{align*}
so that
 \begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] - \int_0^T z \cdot \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log (\delta + z\cdot\Upsilon(Y_t,\eta_1,\eta_3)) ] dt}\end{align*}
\begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] - \int_0^T z \cdot \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) \log (\delta + z\cdot\Upsilon(Y_t,\eta_1,\eta_3)) ] dt}\end{align*}
and
 \begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] -z \cdot \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt} \cdot z^{\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt}\end{align*}
\begin{align*}g_{\lambda}^{\boldsymbol{\eta}}(z) \propto e^{\mathbb{E}^{\mathbb{Q}}_{\boldsymbol{\lambda}_{\backslash\boldsymbol{\eta}}} [   \log f_{\boldsymbol{\lambda}}(\textbf{z}) ] -z \cdot \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt} \cdot z^{\int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt}\end{align*}
as 
 $\delta\rightarrow 0$
, for
$\delta\rightarrow 0$
, for 
 $z\in\mathbb{R}_+$
 and every
$z\in\mathbb{R}_+$
 and every 
 $\boldsymbol{\eta}\in\mathcal{T}$
.
$\boldsymbol{\eta}\in\mathcal{T}$
.
 
Corollary 2. (Conjugate prior.) Assume that the prior density on 
 $\boldsymbol{\lambda}$
 also factors across the individual rates, so that
$\boldsymbol{\lambda}$
 also factors across the individual rates, so that 
 $d\mathbb{P}_{\boldsymbol{\lambda}}= \prod_{\boldsymbol{\eta}\in\mathcal{T}} f^{\boldsymbol{\eta}}_\lambda d\mu_{\boldsymbol{\lambda}}$
, where
$d\mathbb{P}_{\boldsymbol{\lambda}}= \prod_{\boldsymbol{\eta}\in\mathcal{T}} f^{\boldsymbol{\eta}}_\lambda d\mu_{\boldsymbol{\lambda}}$
, where 
 $f_{\lambda}^{\boldsymbol{\eta}}$
 for
$f_{\lambda}^{\boldsymbol{\eta}}$
 for 
 $\boldsymbol{\eta}\in\mathcal{T}$
 denote gamma density functions with shape
$\boldsymbol{\eta}\in\mathcal{T}$
 denote gamma density functions with shape 
 $\alpha_{\boldsymbol{\eta}}$
 and rate
$\alpha_{\boldsymbol{\eta}}$
 and rate 
 $\beta_{\boldsymbol{\eta}}$
. Then, as
$\beta_{\boldsymbol{\eta}}$
. Then, as 
 $\delta \rightarrow 0$
 in (6), these are conjugate priors, and
$\delta \rightarrow 0$
 in (6), these are conjugate priors, and 
 \begin{equation*}\lambda_{\boldsymbol{\eta}} \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(\alpha_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{equation*}
\begin{equation*}\lambda_{\boldsymbol{\eta}} \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(\alpha_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{equation*}
for all 
 $\boldsymbol{\eta}\in\mathcal{T}$
.
$\boldsymbol{\eta}\in\mathcal{T}$
.
 Hence, as the network model with negative queues under 
 $\tilde{\mathbb{P}}$
 offers a better approximation of its original counterpart (
$\tilde{\mathbb{P}}$
 offers a better approximation of its original counterpart (
 $\delta \rightarrow 0$
), we may numerically approximate posterior distributions across the infinitesimal rates in
$\delta \rightarrow 0$
), we may numerically approximate posterior distributions across the infinitesimal rates in 
 $\boldsymbol{\lambda}$
, under the variational measure
$\boldsymbol{\lambda}$
, under the variational measure 
 $\mathbb{Q}$
. In the special case with independent gamma prior densities, this is an easily interpretable posterior where the shape and rate parameters depend, respectively, on the integrated expected jump intensities and the integrated expected station loads in (8) and (9).
$\mathbb{Q}$
. In the special case with independent gamma prior densities, this is an easily interpretable posterior where the shape and rate parameters depend, respectively, on the integrated expected jump intensities and the integrated expected station loads in (8) and (9).
6. Applications
The results in this paper suggest an iterative approximation procedure for (5) by means of coordinate ascent. Here, we iteratively update the values of the various rates, functions, and Lagrange multipliers while evaluating and assessing convergence in the lower bound (11) to the log-likelihood. This is a standard approach in variational inference when looking for (local) maxima [Reference Blei, Kucukelbir and McAuliffe4], and the problem is known to be convex.
 Maximizing the bound by calibrating the measure 
 $\mathbb{Q}$
 will yield an approximation to the regular conditional probability of events in
$\mathbb{Q}$
 will yield an approximation to the regular conditional probability of events in 
 $\mathcal{F}$
 under
$\mathcal{F}$
 under 
 $\tilde{\mathbb{P}}$
, conditioned on the observations. This may then be projected over the rates
$\tilde{\mathbb{P}}$
, conditioned on the observations. This may then be projected over the rates 
 $\boldsymbol{\lambda}$
 or the trajectories in X and Y. These projected densities are valid for approximating the conditional distributions of the service rates
$\boldsymbol{\lambda}$
 or the trajectories in X and Y. These projected densities are valid for approximating the conditional distributions of the service rates 
 $\boldsymbol{\mu}$
 and routing probabilities
$\boldsymbol{\mu}$
 and routing probabilities 
 $\mathcal{P}$
 in the original QN system, given observations. The final iterative procedure is described below.
$\mathcal{P}$
 in the original QN system, given observations. The final iterative procedure is described below.
- 
1. Input network observations in (3) and assign a (conjugate) gamma (image) density  $d\mathbb{P}_{\boldsymbol{\lambda}}$
 across job transition intensities $d\mathbb{P}_{\boldsymbol{\lambda}}$
 across job transition intensities $\boldsymbol{\lambda}$
, with shape parameters $\boldsymbol{\lambda}$
, with shape parameters $\alpha_{\boldsymbol{\eta}}$
 and a (shared) rate parameter $\alpha_{\boldsymbol{\eta}}$
 and a (shared) rate parameter $\beta_{\boldsymbol{\eta}}=\beta$
, $\beta_{\boldsymbol{\eta}}=\beta$
, $\boldsymbol{\eta}\in\mathcal{T}$
. $\boldsymbol{\eta}\in\mathcal{T}$
.
- 
2. Define a discretization grid of the time interval [0, T], and operate through interpolation within points in the grid. 
- 
3. Set an arbitrary density  $d\mathbb{Q}_{\boldsymbol{\lambda}}$
. Fix $d\mathbb{Q}_{\boldsymbol{\lambda}}$
. Fix $\kappa_t^{\boldsymbol{\eta}}(y)=0$
, $\kappa_t^{\boldsymbol{\eta}}(y)=0$
, $r_t^{\boldsymbol{\eta}}(y)=1$
, and input (valid) arbitrary starting values $r_t^{\boldsymbol{\eta}}(y)=1$
, and input (valid) arbitrary starting values $\mathbb{Q}(Y^{\boldsymbol{\eta}}_t=y)$
, for all $\mathbb{Q}(Y^{\boldsymbol{\eta}}_t=y)$
, for all $t\in[0,T]$
, $t\in[0,T]$
, $\boldsymbol{\eta}\in\mathcal{T}$
, and $\boldsymbol{\eta}\in\mathcal{T}$
, and $y\in\mathbb{N}_0$
. $y\in\mathbb{N}_0$
.
- 
4. Iterate until convergence: - 
4.1. In each direction  $\boldsymbol{\eta}\in\mathcal{T}$
, numerically compute $\boldsymbol{\eta}\in\mathcal{T}$
, numerically compute $r_t^{\boldsymbol{\eta}}(y)$
 for every $r_t^{\boldsymbol{\eta}}(y)$
 for every $t\in[0,T]$
 and $t\in[0,T]$
 and $y\in\mathbb{N}_0$
, by means of (15)–(16). Then update intensity and slack functions $y\in\mathbb{N}_0$
, by means of (15)–(16). Then update intensity and slack functions $\nu_t^{\boldsymbol{\eta}}(y)$
, $\nu_t^{\boldsymbol{\eta}}(y)$
, $\kappa_t^{\boldsymbol{\eta}}(y)$
 with (14), so that $\kappa_t^{\boldsymbol{\eta}}(y)$
 with (14), so that $k_t^{\boldsymbol{\eta}}(y)=0$
 if 
and $k_t^{\boldsymbol{\eta}}(y)=0$
 if 
and \begin{equation*}\frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}<\bar{\nu} ,\end{equation*}
otherwise. Renew transient state probabilities in Y by means of the master equation (17), for all \begin{equation*}\frac{r_t^{\boldsymbol{\eta}}(y+1)}{r_t^{\boldsymbol{\eta}}(y)} \cdot e^{\mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big]}<\bar{\nu} ,\end{equation*}
otherwise. Renew transient state probabilities in Y by means of the master equation (17), for all \begin{equation*}\nu_t^{\boldsymbol{\eta}}(y) = \bar{\nu}, \quad k_t^{\boldsymbol{\eta}}(y) = \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) \cdot \left[ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \log\frac{\bar{\nu} \cdot r_t^{\boldsymbol{\eta}}(y)}{r_t^{\boldsymbol{\eta}}(y+1)} \right]\end{equation*} \begin{equation*}\nu_t^{\boldsymbol{\eta}}(y) = \bar{\nu}, \quad k_t^{\boldsymbol{\eta}}(y) = \mathbb{Q}(Y_t^{\boldsymbol{\eta}}=y) \cdot \left[ \mathbb{E}^{\mathbb{Q}}_{Y^{\backslash\boldsymbol{\eta}}_t,\boldsymbol{\lambda}} \big[ \log\Xi_{Y_t,{\boldsymbol{\eta}}} | Y_t^{\boldsymbol{\eta}}=y \big] - \log\frac{\bar{\nu} \cdot r_t^{\boldsymbol{\eta}}(y)}{r_t^{\boldsymbol{\eta}}(y+1)} \right]\end{equation*} $t\in [0,T]$
. $t\in [0,T]$
.
- 
4.2. Derive expected intensities and station loads \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ]\end{equation*}
for all directions \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ]\end{equation*}
for all directions \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ],\end{equation*} \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ],\end{equation*} $\boldsymbol{\eta}\in\mathcal{T}$
 and times $\boldsymbol{\eta}\in\mathcal{T}$
 and times $t\in[0,T]$
. Update $t\in[0,T]$
. Update $\mathbb{Q}$
-densities so that 
for all $\mathbb{Q}$
-densities so that 
for all \begin{equation*}\lambda_{\boldsymbol{\eta}} \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(\alpha_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{equation*} \begin{equation*}\lambda_{\boldsymbol{\eta}} \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(\alpha_{\boldsymbol{\eta}} + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{equation*} $\boldsymbol{\eta}\in\mathcal{T}$
. The likelihood ratio in (5) can be computed at this stage. $\boldsymbol{\eta}\in\mathcal{T}$
. The likelihood ratio in (5) can be computed at this stage.
- 
4.3. Evaluate the lower bound (11), given the current densities and infinitesimal rates under the approximating measure  $\mathbb{Q}$
. Assess variation in the bound across iterations and establish convergence. $\mathbb{Q}$
. Assess variation in the bound across iterations and establish convergence.
 
- 
- 
5. Finally, infer the structure of the various service rates and routing probabilities in the QN system. - 
5.1. Note that remains the same across directions \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0]\end{equation*} \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0]\end{equation*} $\boldsymbol{\eta}\in\mathcal{T}$
 with shared origin station. Since 
it holds that $\boldsymbol{\eta}\in\mathcal{T}$
 with shared origin station. Since 
it holds that \begin{equation*}\mu_i^c = \sum_{\boldsymbol{\eta}\in\mathcal{T}^{\rightarrow}_{i,c}} \lambda_{\boldsymbol{\eta}},\end{equation*}
for all \begin{equation*}\mu_i^c = \sum_{\boldsymbol{\eta}\in\mathcal{T}^{\rightarrow}_{i,c}} \lambda_{\boldsymbol{\eta}},\end{equation*}
for all \begin{align*}\textstyle  \mu_i^c \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(&|\mathcal{T}^{\rightarrow}_{i,c}|\cdot\alpha_{\boldsymbol{\eta}} + \sum_{\boldsymbol{\eta}\in\mathcal{T}^{\rightarrow}_{i,c}} \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta \\ &\quad  + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{align*} \begin{align*}\textstyle  \mu_i^c \stackrel{\mathbb{Q}}{\sim}\Gamma\Big(&|\mathcal{T}^{\rightarrow}_{i,c}|\cdot\alpha_{\boldsymbol{\eta}} + \sum_{\boldsymbol{\eta}\in\mathcal{T}^{\rightarrow}_{i,c}} \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ] dt, \beta \\ &\quad  + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ] dt \Big)\end{align*} $ 0\leq i\leq M$
, $ 0\leq i\leq M$
, $c\in\mathcal{C}$
. $c\in\mathcal{C}$
.
- 
5.2. Retrieve distributions for routing probabilities by noting that  $p_{i,j}^c=\lambda_{i,j,c}/\mu_i^c$
 for all $p_{i,j}^c=\lambda_{i,j,c}/\mu_i^c$
 for all $ 0\leq i,j\leq M$
, $ 0\leq i,j\leq M$
, $c\in\mathcal{C}$
. This suggests a Dirichlet distribution. $c\in\mathcal{C}$
. This suggests a Dirichlet distribution.
 
- 
The following examples treat open and closed network models; source code can be found at https://github.com/IkerPerez/variationalQueues.
6.1. Single-class closed network
 We begin by analysing the small closed network example previously shown in Figure 2. We recall that this includes one FCFS service station, with 
 $K_1=1$
 processing unit, along with a delay, together processing a population of N jobs cyclically in a closed loop. All jobs belong to the same class and have equal service rates; we denote by
$K_1=1$
 processing unit, along with a delay, together processing a population of N jobs cyclically in a closed loop. All jobs belong to the same class and have equal service rates; we denote by 
 $\mu_1$
 the job processing rate within the service station. On completion, a job proceeds to the delay station where it awaits for an exponentially distributed time before being routed back to the queue. We use
$\mu_1$
 the job processing rate within the service station. On completion, a job proceeds to the delay station where it awaits for an exponentially distributed time before being routed back to the queue. We use 
 $\mu_0$
 to denote the delay rate, and we note that the arrival rate to the queue is directly proportional to the number of jobs at the delay.
$\mu_0$
 to denote the delay rate, and we note that the arrival rate to the queue is directly proportional to the number of jobs at the delay.
 Both stations are independent and 
 $\mu_0$
 is fixed in order to ensure model identifiability within the service station. In this instance, the network topology is deterministic and trivial, and the evolution of
$\mu_0$
 is fixed in order to ensure model identifiability within the service station. In this instance, the network topology is deterministic and trivial, and the evolution of 
 $X=(X_t)_{t\geq 0}$
 monitors the total number of jobs within the service station, with
$X=(X_t)_{t\geq 0}$
 monitors the total number of jobs within the service station, with 
 $X_0=0$
. The generator Q of X is finite and satisfies
$X_0=0$
. The generator Q of X is finite and satisfies 
 \begin{equation*}Q= \begin{array}{c} \\ \\[-7pt] 0 \\ \\[-7pt] 1\\ \\[-7pt] \vdots\\ \\[-7pt] N-1 \\ \\[-7pt] N \end{array} \left[\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}      & 0 & 1 & 2 & \dots & N-2 & N-1 & N \\ \\[-7pt]     & -N\mu_0 & N\mu_0 & 0 & \dots & 0 & 0 & 0\\ \\[-7pt]    & \mu_1 & -N\mu_0+\mu_0-\mu_1 & (N-1)\mu_0 & \dots & 0 & 0 & 0 \\ \\[-7pt]     & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots\\ \\[-7pt]    & 0 & 0 & 0 & \dots & \mu_1 & -(\mu_0+\mu_1) & \mu_0 \\ \\[-7pt]    & 0 & 0 & 0 & \dots & 0 & \mu_1 & -\mu_1  \end{array}\right],\end{equation*}
\begin{equation*}Q= \begin{array}{c} \\ \\[-7pt] 0 \\ \\[-7pt] 1\\ \\[-7pt] \vdots\\ \\[-7pt] N-1 \\ \\[-7pt] N \end{array} \left[\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}      & 0 & 1 & 2 & \dots & N-2 & N-1 & N \\ \\[-7pt]     & -N\mu_0 & N\mu_0 & 0 & \dots & 0 & 0 & 0\\ \\[-7pt]    & \mu_1 & -N\mu_0+\mu_0-\mu_1 & (N-1)\mu_0 & \dots & 0 & 0 & 0 \\ \\[-7pt]     & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots\\ \\[-7pt]    & 0 & 0 & 0 & \dots & \mu_1 & -(\mu_0+\mu_1) & \mu_0 \\ \\[-7pt]    & 0 & 0 & 0 & \dots & 0 & \mu_1 & -\mu_1  \end{array}\right],\end{equation*}
where row and column labels denote the number of jobs in the queueing station. Since 
 $\mu_0$
 is fixed, our interest lies in
$\mu_0$
 is fixed, our interest lies in 
 $\lambda\equiv \lambda_{1,0} = \mu_1 \cdot  p_{1,0} = \mu_1$
. We assign to this rate a distribution
$\lambda\equiv \lambda_{1,0} = \mu_1 \cdot  p_{1,0} = \mu_1$
. We assign to this rate a distribution 
 $\mathbb{P}_\lambda\equiv \lambda_\star\mathbb{P}$
 with (gamma) density
$\mathbb{P}_\lambda\equiv \lambda_\star\mathbb{P}$
 with (gamma) density 
 $f_\lambda$
 such that its hyperparameters fix some reasonably uninformative prior knowledge on the system. We monitor the delay and FCFS service station at fixed and equally spaced times
$f_\lambda$
 such that its hyperparameters fix some reasonably uninformative prior knowledge on the system. We monitor the delay and FCFS service station at fixed and equally spaced times 
 $t_1<\dots<t_K$
 in an interval [0, T]. Here, variables
$t_1<\dots<t_K$
 in an interval [0, T]. Here, variables 
 $O_k$
 are supported on
$O_k$
 are supported on 
 $\mathcal{O}=\{0,\dots,N\}^2$
, and the observation model factors across the network components so that
$\mathcal{O}=\{0,\dots,N\}^2$
, and the observation model factors across the network components so that 
 $f_{O|x}(\textbf{o}) = \tilde{f}_{O|N-x}(o_0) \cdot \tilde{f}_{O|x}(o_1)$
 with
$f_{O|x}(\textbf{o}) = \tilde{f}_{O|N-x}(o_0) \cdot \tilde{f}_{O|x}(o_1)$
 with 
 $\tilde{f}_{O|x}(o)=\frac{\epsilon}{N} + \mathbb{I}(o=x) \cdot (1-\frac{N+1}{N}\epsilon)$
 and
$\tilde{f}_{O|x}(o)=\frac{\epsilon}{N} + \mathbb{I}(o=x) \cdot (1-\frac{N+1}{N}\epsilon)$
 and 
 $$\matrix{
   \mathbb{P}{(O_k^{ - 1}({\bf{o}})|X) = \left\{ {\matrix{
   {{{(1 -{\epsilon})}^2},} \hfill & {{o_0} = N - {X_{{t_k}}},{\mkern 1mu} {o_1} = {X_{{t_k}}},} \hfill  \cr 
   {{{({\epsilon}/N)}^2},} \hfill & {{o_0} \ne N - {X_{{t_k}}},{\mkern 1mu} {o_1} \ne {X_{{t_k}}},} \hfill  \cr 
   {(1 -{\epsilon})\cdot{\epsilon}/N} \hfill & {{\rm{otherwise}}} \hfill  \cr 
 } } \right.} \hfill  \cr 
 } $$
$$\matrix{
   \mathbb{P}{(O_k^{ - 1}({\bf{o}})|X) = \left\{ {\matrix{
   {{{(1 -{\epsilon})}^2},} \hfill & {{o_0} = N - {X_{{t_k}}},{\mkern 1mu} {o_1} = {X_{{t_k}}},} \hfill  \cr 
   {{{({\epsilon}/N)}^2},} \hfill & {{o_0} \ne N - {X_{{t_k}}},{\mkern 1mu} {o_1} \ne {X_{{t_k}}},} \hfill  \cr 
   {(1 -{\epsilon})\cdot{\epsilon}/N} \hfill & {{\rm{otherwise}}} \hfill  \cr 
 } } \right.} \hfill  \cr 
 } $$
for 
 $\epsilon>0$
 and all
$\epsilon>0$
 and all 
 $k=1,\dots,K$
. This accounts for some
$k=1,\dots,K$
. This accounts for some 
 $\epsilon \cdot 100 \%$
 faulty measurements; also, we note that a system with discrete observations is approximated as
$\epsilon \cdot 100 \%$
 faulty measurements; also, we note that a system with discrete observations is approximated as 
 $\epsilon\rightarrow 0$
. Now, assume there exist some sample observations
$\epsilon\rightarrow 0$
. Now, assume there exist some sample observations 
 $\textbf{o}_1,\dots,\textbf{o}_K$
 from a model realization in this closed network. These can easily be produced from (18) given a trajectory
$\textbf{o}_1,\dots,\textbf{o}_K$
 from a model realization in this closed network. These can easily be produced from (18) given a trajectory 
 $(X_t)_{t\in[0,T]}$
. In order to produce the trajectory from (2), given the service rates, we may employ Gillespie’s algorithm [Reference Gillespie10] or faster uniformization-based alternatives [Reference Rao and Teh24].
$(X_t)_{t\in[0,T]}$
. In order to produce the trajectory from (2), given the service rates, we may employ Gillespie’s algorithm [Reference Gillespie10] or faster uniformization-based alternatives [Reference Rao and Teh24].
Remark 1. The transformation 
 $d\mathbb{P}_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}/d\mathbb{P}_{\lambda}$
 is such that, conditioned on
$d\mathbb{P}_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}/d\mathbb{P}_{\lambda}$
 is such that, conditioned on 
 $\textbf{o}_1,\dots,\textbf{o}_K$
, the distribution over
$\textbf{o}_1,\dots,\textbf{o}_K$
, the distribution over 
 $\lambda$
 admits a density carried by the Lebesgue measure
$\lambda$
 admits a density carried by the Lebesgue measure 
 $\mu_\lambda$
, so that
$\mu_\lambda$
, so that 
 $d\mathbb{P}_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}=f_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}\,d\mu_\lambda$
. In this simple example, numerical MCMC procedures [Reference Perez, Hodge and Kypraios22] or basic generator-matrix exponentiations combined with a forward–backward algorithm [Reference Zhao32] can offer such density approximations; however, this is reportedly very inefficient when N is large. Moreover, in involved networks/processes for complex applications (see next example), such alternatives are simply unusable (i.e. they do not scale).
$d\mathbb{P}_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}=f_{\lambda|\textbf{o}_1,\dots,\textbf{o}_K}\,d\mu_\lambda$
. In this simple example, numerical MCMC procedures [Reference Perez, Hodge and Kypraios22] or basic generator-matrix exponentiations combined with a forward–backward algorithm [Reference Zhao32] can offer such density approximations; however, this is reportedly very inefficient when N is large. Moreover, in involved networks/processes for complex applications (see next example), such alternatives are simply unusable (i.e. they do not scale).
 In the following, we analyse simulated data (
 $N=50,\,\mu_0=0.1,\,\epsilon=0.2,\,T=100,\,K=50$
) by assigning a conjugate gamma density to
$N=50,\,\mu_0=0.1,\,\epsilon=0.2,\,T=100,\,K=50$
) by assigning a conjugate gamma density to 
 $\lambda\equiv\mu_1$
, so that
$\lambda\equiv\mu_1$
, so that 
 $\lambda\sim\Gamma(\alpha,\beta)$
 with
$\lambda\sim\Gamma(\alpha,\beta)$
 with 
 $\alpha=5$
 and
$\alpha=5$
 and 
 $\beta=2$
 under the reference measure
$\beta=2$
 under the reference measure 
 $\mathbb{P}$
. Recall that
$\mathbb{P}$
. Recall that 
 $\mu_0$
 is fixed to ensure model identifiability, and
$\mu_0$
 is fixed to ensure model identifiability, and 
 $X_t\in\{0,\dots,N\}$
 denotes the number of jobs in the service station at any time
$X_t\in\{0,\dots,N\}$
 denotes the number of jobs in the service station at any time 
 $t\geq0$
. For later reference, the stationary distribution of the system is given by
$t\geq0$
. For later reference, the stationary distribution of the system is given by 
 \begin{align*}\pi_{\mathbb{P}}(x|\lambda) = \lim_{t\rightarrow\infty}  \mathbb{P}(X_t=x|\lambda) = \left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}\bigg/\sum_{x=0}^N\left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!},\end{align*}
\begin{align*}\pi_{\mathbb{P}}(x|\lambda) = \lim_{t\rightarrow\infty}  \mathbb{P}(X_t=x|\lambda) = \left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}\bigg/\sum_{x=0}^N\left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!},\end{align*}
so that, assuming the observations are sufficiently spaced and the system has reached stationarity, it holds that
 \begin{align}\mathbb{P}\Big(\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)\big|\lambda\Big) \approx \prod_{k=1}^K \sum_{x=0}^N f_{O|x}(\textbf{o}_k)  \pi_{\mathbb{P}}(x|\lambda) = \frac{\prod_{k=1}^K \sum_{x=0}^N f_{O|x}(\textbf{o}_k)  \left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}}{\Big(\sum_{x=0}^N\left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}\Big)^K}, \end{align}
\begin{align}\mathbb{P}\Big(\bigcap_{k=1}^K O_k^{-1}(\textbf{o}_k)\big|\lambda\Big) \approx \prod_{k=1}^K \sum_{x=0}^N f_{O|x}(\textbf{o}_k)  \pi_{\mathbb{P}}(x|\lambda) = \frac{\prod_{k=1}^K \sum_{x=0}^N f_{O|x}(\textbf{o}_k)  \left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}}{\Big(\sum_{x=0}^N\left(\frac{\mu_0}{\lambda}\right)^x\frac{1}{(N-x)!}\Big)^K}, \end{align}
where 
 $f_{O|x}(\textbf{o}_k)$
 is as defined in (18). Note that here
$f_{O|x}(\textbf{o}_k)$
 is as defined in (18). Note that here 
 $|\mathcal{T}|=2$
, and the process
$|\mathcal{T}|=2$
, and the process 
 $Y=(Y^{0,1}_t,Y^{1,0}_t)_{t\geq 0}$
 monitors transitions between the delay and service station, in both the directions
$Y=(Y^{0,1}_t,Y^{1,0}_t)_{t\geq 0}$
 monitors transitions between the delay and service station, in both the directions 
 $0\rightarrow 1$
 and
$0\rightarrow 1$
 and 
 $1\rightarrow 0$
. The lower bound to the log-likelihood in (11) reduces to
$1\rightarrow 0$
. The lower bound to the log-likelihood in (11) reduces to 
 \begin{align}\log\tilde{\mathbb{P}}(\textbf{O}) \geq \ & \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  \mathbb{P}(O^{-1}_k(\textbf{o}_k) |Y^{0,1}_{t_k} - Y^{1,0}_{t_k})  \big]\nonumber\\[3pt] &-\mathbb{E}^{\mathbb{Q}}_{\lambda} \bigg[  \log\frac{ g_{\lambda}^{1,0} }{ d\mathbb{P}_{\lambda}   } \bigg] + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\lambda} \big[ \Psi[Y_t,\nu_t,\lambda] \big] dt  \end{align}
\begin{align}\log\tilde{\mathbb{P}}(\textbf{O}) \geq \ & \sum_{k=1}^K\mathbb{E}^{\mathbb{Q}}_{Y_{t_k}} \big[ \log  \mathbb{P}(O^{-1}_k(\textbf{o}_k) |Y^{0,1}_{t_k} - Y^{1,0}_{t_k})  \big]\nonumber\\[3pt] &-\mathbb{E}^{\mathbb{Q}}_{\lambda} \bigg[  \log\frac{ g_{\lambda}^{1,0} }{ d\mathbb{P}_{\lambda}   } \bigg] + \int_0^T  \mathbb{E}^{\mathbb{Q}}_{Y_t,\lambda} \big[ \Psi[Y_t,\nu_t,\lambda] \big] dt  \end{align}
with
 \begin{align*}\Psi[Y_t,\nu_t,\lambda] = \ & \nu^{1,0}_t (Y^{1,0}_t) + \nu^{0,1}_t (Y^{0,1}_t)  - 2\delta - \lambda\cdot\mathbb{I}(Y^{0,1}_{t_k} - Y^{1,0}_{t_k} > 0) \\[3pt] &- \mu_0\cdot (N + Y^{1,0}_{t_k} - Y^{0,1}_{t_k})   -  \nu^{1,0}_t (Y^{1,0}_t) \log \frac{  \nu^{1,0}_t (Y^{1,0}_t)}{\delta + \lambda\cdot\mathbb{I}(Y^{0,1}_{t_k} - Y^{1,0}_{t_k} > 0)} \\[3pt] &- \nu^{0,1}_t (Y^{0,1}_t) \log \frac{ \nu^{0,1}_t (Y^{0,1}_t) }{\delta + \mu_0\cdot (N + Y^{1,0}_{t_k} - Y^{0,1}_{t_k}) },\end{align*}
\begin{align*}\Psi[Y_t,\nu_t,\lambda] = \ & \nu^{1,0}_t (Y^{1,0}_t) + \nu^{0,1}_t (Y^{0,1}_t)  - 2\delta - \lambda\cdot\mathbb{I}(Y^{0,1}_{t_k} - Y^{1,0}_{t_k} > 0) \\[3pt] &- \mu_0\cdot (N + Y^{1,0}_{t_k} - Y^{0,1}_{t_k})   -  \nu^{1,0}_t (Y^{1,0}_t) \log \frac{  \nu^{1,0}_t (Y^{1,0}_t)}{\delta + \lambda\cdot\mathbb{I}(Y^{0,1}_{t_k} - Y^{1,0}_{t_k} > 0)} \\[3pt] &- \nu^{0,1}_t (Y^{0,1}_t) \log \frac{ \nu^{0,1}_t (Y^{0,1}_t) }{\delta + \mu_0\cdot (N + Y^{1,0}_{t_k} - Y^{0,1}_{t_k}) },\end{align*}
so that it contains only two hazard functions in the approximating measure 
 $\mathbb{Q}$
, namely
$\mathbb{Q}$
, namely 
 $\nu^{0,1}$
 and
$\nu^{0,1}$
 and 
 $\nu^{1,0}$
. In (20), we again notice that the lower bound is dominated by three distinguishable components: (i) the expected log-observations, (ii) the KL divergence across the service rate density, and (iii) a weighted
$\nu^{1,0}$
. In (20), we again notice that the lower bound is dominated by three distinguishable components: (i) the expected log-observations, (ii) the KL divergence across the service rate density, and (iii) a weighted 
 $\mathbb{P}$
-to-
$\mathbb{P}$
-to-
 $\mathbb{Q}$
 divergence in the expected path likelihood, further integrated along the entire network trajectory. The differential equations for functionals in (16) reduce to
$\mathbb{Q}$
 divergence in the expected path likelihood, further integrated along the entire network trajectory. The differential equations for functionals in (16) reduce to 
 \begin{align*}\frac{dr_t^{0,1}(y)}{dt} &= r_t^{0,1}(y) \big(\delta + \mu_0 \cdot \mathbb{E}^{\mathbb{Q}}_{Y^{1,0}_t} \big[ (Y^{1,0}_t - y)\vee 0)  \big] \big) \\[3pt]&- \frac{1+ k_t^{0,1}(y)/\mathbb{Q}(Y_t^{0,1}=y)}{e^{ k_t^{0,1}(y)/\mathbb{Q}(Y_t^{0,1}=y)}}  r_t^{0,1}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{1,0}_t} \big[ \log(\delta + \mu_0\cdot[(Y^{1,0}_t-y)\vee 0]) \big]}\end{align*}
\begin{align*}\frac{dr_t^{0,1}(y)}{dt} &= r_t^{0,1}(y) \big(\delta + \mu_0 \cdot \mathbb{E}^{\mathbb{Q}}_{Y^{1,0}_t} \big[ (Y^{1,0}_t - y)\vee 0)  \big] \big) \\[3pt]&- \frac{1+ k_t^{0,1}(y)/\mathbb{Q}(Y_t^{0,1}=y)}{e^{ k_t^{0,1}(y)/\mathbb{Q}(Y_t^{0,1}=y)}}  r_t^{0,1}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{1,0}_t} \big[ \log(\delta + \mu_0\cdot[(Y^{1,0}_t-y)\vee 0]) \big]}\end{align*}
and
 \begin{align*}\frac{dr_t^{1,0}(y)}{dt} &= r_t^{1,0}(y)  \big(\delta + \mathbb{E}^{\mathbb{Q}}_{\lambda}[\lambda]\cdot \mathbb{Q}(Y^{0,1}_t>y) \big) \\&\quad - \frac{1+k_t^{1,0}(y)/\mathbb{Q}(Y_t^{1,0}=y)}{e^{ k_t^{1,0}(y)/\mathbb{Q}(Y_t^{1,0}=y)}}  r_t^{1,0}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{0,1}_t,\boldsymbol{\lambda}} \big[ \log(\delta + \lambda \cdot \mathbb{I}(Y^{0,1}_t>y)) \big]} .\end{align*}
\begin{align*}\frac{dr_t^{1,0}(y)}{dt} &= r_t^{1,0}(y)  \big(\delta + \mathbb{E}^{\mathbb{Q}}_{\lambda}[\lambda]\cdot \mathbb{Q}(Y^{0,1}_t>y) \big) \\&\quad - \frac{1+k_t^{1,0}(y)/\mathbb{Q}(Y_t^{1,0}=y)}{e^{ k_t^{1,0}(y)/\mathbb{Q}(Y_t^{1,0}=y)}}  r_t^{1,0}(y+1) e^{\mathbb{E}^{\mathbb{Q}}_{Y^{0,1}_t,\boldsymbol{\lambda}} \big[ \log(\delta + \lambda \cdot \mathbb{I}(Y^{0,1}_t>y)) \big]} .\end{align*}
 In Figure 4 (left) we observe the evolution of the lower bound (20) during the iterative inferential procedure, for a sufficiently small and negligible value of 
 $\delta$
. There, we notice that the procedure has converged to a (local) optimum within approximately 13 iterations. On the right-hand side of the figure, we further observe credible intervals and point estimates across iterations for
$\delta$
. There, we notice that the procedure has converged to a (local) optimum within approximately 13 iterations. On the right-hand side of the figure, we further observe credible intervals and point estimates across iterations for 
 $\lambda$
, under the approximating measure
$\lambda$
, under the approximating measure 
 $\mathbb{Q}$
. Displayed in the back (in grey) are the estimates obtained using the proposed method in this paper; in the front (in blue), we find approximations extracted by adapting variational procedures in [Reference Opper and Sanguinetti18], to allow for prior knowledge and conjugacy properties.
$\mathbb{Q}$
. Displayed in the back (in grey) are the estimates obtained using the proposed method in this paper; in the front (in blue), we find approximations extracted by adapting variational procedures in [Reference Opper and Sanguinetti18], to allow for prior knowledge and conjugacy properties.

Figure 4: Left, evolution of lower bound to the log-likelihood during the inferential procedure. Right, 95% credible intervals and point estimates for the service rate 
 $\lambda$
 under
$\lambda$
 under 
 $\mathbb{Q}$
; the back (grey) corresponds to the proposed method, while the front (blue) is for an (adapted) traditional variational procedure.
$\mathbb{Q}$
; the back (grey) corresponds to the proposed method, while the front (blue) is for an (adapted) traditional variational procedure.
 Next, in Figure 5 (left) we find the 
 $\mathbb{P}$
-prior density for
$\mathbb{P}$
-prior density for 
 $\lambda$
 (flat density in the back of the figure) with the final-iteration
$\lambda$
 (flat density in the back of the figure) with the final-iteration 
 $\mathbb{Q}$
-posterior superimposed (grey density in the front); the additional red and blue densities represent
$\mathbb{Q}$
-posterior superimposed (grey density in the front); the additional red and blue densities represent
- 
• the posterior density obtained through Metropolis–Hastings MCMC, by means of strong stationarity assumptions leading to the likelihood function shown in (19), and 
- 
• the approximate posterior obtained by adapting variational procedures in [Reference Opper and Sanguinetti18]. 

Figure 5: Left, prior density (light grey flat density in the back) and posterior density (dark grey density in the front) for 
 $\lambda$
, along with MCMC (red) and traditional variational (blue) density estimates. The black dot on the horizontal axis represents the original value in the network simulation. Right, network observations along with mean-average network trajectory and 95% credible interval for job counts in the service station; in grey, our proposed method, in blue, existing variational alernative method.
$\lambda$
, along with MCMC (red) and traditional variational (blue) density estimates. The black dot on the horizontal axis represents the original value in the network simulation. Right, network observations along with mean-average network trajectory and 95% credible interval for job counts in the service station; in grey, our proposed method, in blue, existing variational alernative method.
On the right-hand side we have the network observations on both the service station and the delay. Delay observations are displayed by subtracting their value from the job population N (which represents a second measurement on the service station). Whenever both observations match, these are displayed with a large dot. Along with it, we find the following:
- 
• in grey, a mean-average network trajectory and 95% credible interval for job counts on the service station  $X_t=(Y^{0,1}_t-Y^{1,0}_t)_{t\geq 0}$
, under the approximating measure $X_t=(Y^{0,1}_t-Y^{1,0}_t)_{t\geq 0}$
, under the approximating measure $\mathbb{Q}$
 and with methods introduced in this paper; $\mathbb{Q}$
 and with methods introduced in this paper;
- 
• in blue, a similar confidence interval and mean-average path obtained using benchmark methods in [Reference Opper and Sanguinetti18]. 
Noticeably, the average trajectory under our proposed variational technique flows through the most informative observations (thick dots), and the credible interval widens up to account for some faulty measurements within either network station. On the other hand, traditional variational approaches quickly converge to a local optimum and restrict mean-average dynamics, further compressing confidence intervals in regions with noisy data. In the next example, we notice how this poses a problem for traditional methods; that is, within complex and synchronized stochastic processes, we will fail to obtain sensible estimates for network parameters. Moreover, in our next example, inference by MCMC/forward–backward methods is virtually intractable (cf. [Reference Perez, Hodge and Kypraios22]).
6.2. Multi-class parallel tandems with bottleneck and service priorities
 We analyse an open multi-class QN as pictured in Figure 6. In this network, there exists two classes (
 $c=1,2$
) of jobs that simultaneously transit the system. The first class consists of high-priority jobs with low arrival and service intensity rates. The second class consists of low-priority jobs with high arrival and service rates. Once a job enters the system, a probabilistic routing junction (pictured as a square within the figure) sends this job through either a PS or a priority-FCFS tandem; later, it will be serviced within an infinite station before leaving the network. In the top processor-sharing tandem, each station has five processing units. These will fraction their working capacity as seen in (8), in order to simultaneously service all jobs regardless of their class and priority level; however, service rates will differ depending on the job class. By contrast, the bottom tandem includes two FCFS stations with a single processing unit and priority scheduling. Here, low-priority jobs are serviced only if each station is entirely empty of any high-priority jobs. Consequently, the station loads in (9) are rewritten so that
$c=1,2$
) of jobs that simultaneously transit the system. The first class consists of high-priority jobs with low arrival and service intensity rates. The second class consists of low-priority jobs with high arrival and service rates. Once a job enters the system, a probabilistic routing junction (pictured as a square within the figure) sends this job through either a PS or a priority-FCFS tandem; later, it will be serviced within an infinite station before leaving the network. In the top processor-sharing tandem, each station has five processing units. These will fraction their working capacity as seen in (8), in order to simultaneously service all jobs regardless of their class and priority level; however, service rates will differ depending on the job class. By contrast, the bottom tandem includes two FCFS stations with a single processing unit and priority scheduling. Here, low-priority jobs are serviced only if each station is entirely empty of any high-priority jobs. Consequently, the station loads in (9) are rewritten so that 
 \begin{align*}\textstyle\textstyle\Upsilon(y,i,1) = 1 \wedge x_{i,1}   \quad \text{and} \quad \textstyle\textstyle\Upsilon(y,i,2) = (1 \wedge x_{i,2}) \cdot \mathbb{I}(x_{i,1} <1)\end{align*}
\begin{align*}\textstyle\textstyle\Upsilon(y,i,1) = 1 \wedge x_{i,1}   \quad \text{and} \quad \textstyle\textstyle\Upsilon(y,i,2) = (1 \wedge x_{i,2}) \cdot \mathbb{I}(x_{i,1} <1)\end{align*}
at stations 
 $i\in\{2,4\}$
 and for any
$i\in\{2,4\}$
 and for any 
 $y\in\mathcal{S}^Y$
, where we recall that
$y\in\mathcal{S}^Y$
, where we recall that 
 \begin{equation*}x_{i,c} = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} y_{{\boldsymbol{\eta}}} \end{equation*}
\begin{equation*}x_{i,c} = \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\leftarrow}_{i,c}}  y_{{\boldsymbol{\eta}}} -  \sum_{{\boldsymbol{\eta}}\in\mathcal{T}^{\rightarrow}_{i,c}} y_{{\boldsymbol{\eta}}} \end{equation*}
and thus 
 $x_{2,c}=y_{0,2,c}-y_{2,4,c}, x_{4,c}=y_{2,4,c}-y_{4,5,c}$
, for
$x_{2,c}=y_{0,2,c}-y_{2,4,c}, x_{4,c}=y_{2,4,c}-y_{4,5,c}$
, for 
 $c=1,2$
. Due to the presence of service priorities, the ordering of jobs within the queue is irrelevant (this is also the case with random order disciplines); hence, our inferential framework allows for different service rates assigned to jobs in each class. Finally, the last service station includes an infinite number of processing units, and processing rates also differ depending on the job class.
$c=1,2$
. Due to the presence of service priorities, the ordering of jobs within the queue is irrelevant (this is also the case with random order disciplines); hence, our inferential framework allows for different service rates assigned to jobs in each class. Finally, the last service station includes an infinite number of processing units, and processing rates also differ depending on the job class.

Figure 6: Open QN with one routing junction (pictured as a square) and five service stations with varied disciplines and processing rates.
 We analyse synthetic data created during a time interval [0, T] (
 $T=100$
), with arrival intensities
$T=100$
), with arrival intensities 
 $\mu_0^1=0.5$
,
$\mu_0^1=0.5$
, 
 $\mu_0^2=3$
, routing probabilities
$\mu_0^2=3$
, routing probabilities 
 $p^c_{0,i}=0.5$
,
$p^c_{0,i}=0.5$
, 
 $i,c\in\{1,2\}$
, and service rates as shown in Table 1. We collect a reduced set of noiseless and equally spaced observations with
$i,c\in\{1,2\}$
, and service rates as shown in Table 1. We collect a reduced set of noiseless and equally spaced observations with 
 $K=50$
; these are essentially snapshots of the full system state across its service stations and job classes, so that
$K=50$
; these are essentially snapshots of the full system state across its service stations and job classes, so that 
 $\mathcal{O}=\mathbb{N}^{10}$
, and the observation density in (3) is defined with
$\mathcal{O}=\mathbb{N}^{10}$
, and the observation density in (3) is defined with 
 \begin{equation*}f_{O|x}(\textbf{o}) = \prod_{i=1}^5 \prod_{c=1}^2 \mathbb{I}(x_{i,c} = o_{i,c})\end{equation*}
\begin{equation*}f_{O|x}(\textbf{o}) = \prod_{i=1}^5 \prod_{c=1}^2 \mathbb{I}(x_{i,c} = o_{i,c})\end{equation*}
for 
 $x\in\mathcal{S}$
, where
$x\in\mathcal{S}$
, where 
 $o_{i,c}$
 is an indexed observation in the element
$o_{i,c}$
 is an indexed observation in the element 
 $\textbf{o}$
 denoting the class-c queue length at station
$\textbf{o}$
 denoting the class-c queue length at station 
 $i>0$
. (Source code for the data simulation process may be found at https://github.com/IkerPerez/variationalQueues.) Within the inferential procedure, this observation likelihood must be approximated with some regularized variant similar to (18), while taking
$i>0$
. (Source code for the data simulation process may be found at https://github.com/IkerPerez/variationalQueues.) Within the inferential procedure, this observation likelihood must be approximated with some regularized variant similar to (18), while taking 
 $\epsilon\rightarrow 0$
. Next, we assign conjugate gamma priors to the various service intensities; in order to ensure identifiability in the problem, arrival rates and routing probabilities are fixed, and we focus this inferential task on the various service stations. Hence
$\epsilon\rightarrow 0$
. Next, we assign conjugate gamma priors to the various service intensities; in order to ensure identifiability in the problem, arrival rates and routing probabilities are fixed, and we focus this inferential task on the various service stations. Hence 
 $\boldsymbol{\lambda}\equiv \{\mu^c_i\, :\, c=1,2 \; \text{and} \; i=1,\cdots,5\}$
, and we set
$\boldsymbol{\lambda}\equiv \{\mu^c_i\, :\, c=1,2 \; \text{and} \; i=1,\cdots,5\}$
, and we set 
 $\lambda_{\boldsymbol{\eta}}\sim\Gamma(1,0.3)$
 under the reference measure
$\lambda_{\boldsymbol{\eta}}\sim\Gamma(1,0.3)$
 under the reference measure 
 $\mathbb{P}$
, for all
$\mathbb{P}$
, for all 
 $\boldsymbol{\eta}\in\mathcal{T}$
.
$\boldsymbol{\eta}\in\mathcal{T}$
.
Table 1: Summary statistics for posterior service rates in the QN in Figure 6.

In the following, we omit the cumbersome mathematical details related to this complex model formulation, and we focus on discussing prior choices, calibration of the algorithm, results, and method comparisons following the inferential procedure.
Remarks on using MCMC data-augmentation for inference. Transient inference in a stochastic system with priorities is especially challenging, because of the strong dependencies this generates on the queue lengths across the stations and classes. Specifically,
- 
• data-augmentation methods relying on MCMC techniques do not scale (cf. [Reference Sutton and Jordan27; Reference Perez, Hodge and Kypraios22]), as dependences yield very autocorrelated output chains; 
- 
• there exist no analytic product-form distributions to enable approximate inferential methods under assumptions of system stationarity, as discussed in the previous example; and 
- 
• generator-matrix exponentiations with a forward–backward algorithm [Reference Zhao32] are simply unscalable to such large multivariate systems. 
 
Remarks on using benchmark variational methods for inference. Note that traditional variational methods (cf. [Reference Opper and Sanguinetti18, Reference Cohn, El-Hay, Friedman and Kupferman7]) are centred around populations or lengths in the individual queues. In this example, populations may not be factorized under an approximating measure 
 $\mathbb{Q}$
, since system jumps are synchronized; i.e., a jump down in one queue corresponds to a jump up in another. As a consequence, pairs of approximating rates under
$\mathbb{Q}$
, since system jumps are synchronized; i.e., a jump down in one queue corresponds to a jump up in another. As a consequence, pairs of approximating rates under 
 $\mathbb{Q}$
 will be interlinked with the same real transition rate under
$\mathbb{Q}$
 will be interlinked with the same real transition rate under 
 $\mathbb{P}$
, and derivations such as the lower bound in (11) or Equations (14)–(16) are unattainable. For the sake of completeness and comparisons, we adapt existing variational algorithms to the current task; however, we must make the following changes:
$\mathbb{P}$
, and derivations such as the lower bound in (11) or Equations (14)–(16) are unattainable. For the sake of completeness and comparisons, we adapt existing variational algorithms to the current task; however, we must make the following changes:
- 
• We allow a factorization  $\mathbb{Q}(X_t=\textbf{x})=\prod_{i}\mathbb{Q}(X_{i,t}=x_i)$
, so that jobs may be virtually created and removed in any queue; i.e., jobs do not transit a network, but rather reach and depart servers individually. The full population of jobs in the network is not preserved. $\mathbb{Q}(X_t=\textbf{x})=\prod_{i}\mathbb{Q}(X_{i,t}=x_i)$
, so that jobs may be virtually created and removed in any queue; i.e., jobs do not transit a network, but rather reach and depart servers individually. The full population of jobs in the network is not preserved.
- 
• We duplicate intensities for transitions in the real model; that is, we have a rate for (i) a job departing a queue and (ii) the job arriving at another. Technically, a job could arrive at a new server before it departs the previous one; synchronization is lost; point estimates for parameters are averaged across pairs and weighted for network load. 
As we will see, this leads to drastic performance issues that render the method unusable.

Figure 7: 95% credible intervals for queue lengths across the service stations. Dark (light) grey corresponds to high-priority (low-priority) jobs. Bottom right panel shows expected jump intensity and station load in the direction 
 $\eta=(0,1,1)$
.
$\eta=(0,1,1)$
.
6.2.1 Algorithm calibration
Within our method, prior choices in the system state Y must initially accommodate a strictly positive, albeit not necessarily large, likelihood for low-priority jobs to be serviced at any point in time. Here, we achieve this by assigning Poisson process priors to task transition counts in Y; that is, we first run the master equation (17) with some user-specified constant intensity rates. This creates monotone mean-average queue lengths in the service stations, and we ensure that they flow aligned to the network observations in every instance.
 Also, the presence of strong temporal dependencies will often trigger the approximating rates 
 $\nu$
 in (14) to become unreasonably large, ultimately rendering the algorithm computationally infeasible. This is a phenomenon also observed in [Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Opper and Sanguinetti18], within the context of simpler stochastic dynamics. To ensure computational tractability, we exploit the capping functionals k as in introduced in Proposition 1 and set a global rate cap of
$\nu$
 in (14) to become unreasonably large, ultimately rendering the algorithm computationally infeasible. This is a phenomenon also observed in [Reference Cohn, El-Hay, Friedman and Kupferman7, Reference Opper and Sanguinetti18], within the context of simpler stochastic dynamics. To ensure computational tractability, we exploit the capping functionals k as in introduced in Proposition 1 and set a global rate cap of 
 $\bar{\nu}=50$
. Furthermore, we run the differential equations for r in (16) in log-form. Specific details can be found within the aforementioned source code.
$\bar{\nu}=50$
. Furthermore, we run the differential equations for r in (16) in log-form. Specific details can be found within the aforementioned source code.
6.2.2 Results
 In the plots in Figure 7, with the exception of the bottom right one, we observe 95% credible intervals for the queue length processes 
 $X^{i,c}_t$
 over time, across the various service stations and job classes. In the figure, intervals in dark grey relate to high-priority jobs, and their corresponding queue length observations are represented by black circles. This information is superimposed over its analogue for low-priority jobs, where intervals are coloured in light grey and observations represented by small diamonds. These interval approximations ignore small positive densities that are sometimes assigned to negative queue lengths. Note that this is a consequence of employing counts across job transitions in Y as a basis for inference on X; however, we recall that this is necessary to overcome the coupling challenges described in Sections 2 and 4. Overall, we note that the variational flow captures the collected observations well, with some few exceptions in the stations with priority scheduling; hence, it offers a good basis for building approximate estimates for parameters and the likelihood ratio (5).
$X^{i,c}_t$
 over time, across the various service stations and job classes. In the figure, intervals in dark grey relate to high-priority jobs, and their corresponding queue length observations are represented by black circles. This information is superimposed over its analogue for low-priority jobs, where intervals are coloured in light grey and observations represented by small diamonds. These interval approximations ignore small positive densities that are sometimes assigned to negative queue lengths. Note that this is a consequence of employing counts across job transitions in Y as a basis for inference on X; however, we recall that this is necessary to overcome the coupling challenges described in Sections 2 and 4. Overall, we note that the variational flow captures the collected observations well, with some few exceptions in the stations with priority scheduling; hence, it offers a good basis for building approximate estimates for parameters and the likelihood ratio (5).
The bottom right plot in Figure 7 shows an overview of the expected jump intensity
 \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ]\end{equation*}
\begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y^{\boldsymbol{\eta}}_t} [   \nu^{\boldsymbol{\eta}}_t (Y^{\boldsymbol{\eta}}_t) ]\end{equation*}
and station load
 \begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ]\end{equation*}
\begin{equation*}\mathbb{E}^{\mathbb{Q}}_{Y_t} [ \Upsilon(Y_t,\eta_1,\eta_3)\vee 0  ]\end{equation*}
in the direction 
 $\boldsymbol{\eta}=(0,1,1)$
 at times
$\boldsymbol{\eta}=(0,1,1)$
 at times 
 $t\in[0,T]$
. The sharp peaks in the intensities come at observation times, ensuring that the process density transits through the observations. Finally, we notice that the expected station load differs from the estimate of the high-priority queue length in station 1, as this process combines and weights the queue length across the two priorities according to (8).
$t\in[0,T]$
. The sharp peaks in the intensities come at observation times, ensuring that the process density transits through the observations. Finally, we notice that the expected station load differs from the estimate of the high-priority queue length in station 1, as this process combines and weights the queue length across the two priorities according to (8).
 Next, we find in Table 1 summary statistics for the posterior service rates under the approximating variational measure 
 $\mathbb{Q}$
, along with point estimates obtained by adapting benchmark variational techniques in [Reference Opper and Sanguinetti18]. We observe that the proposed framework allows us to gain a good overview of the system properties and variability in the processing speed across the various stations; by contrast, existing methods are far from offering reasonable approximations to system behaviour (they instead seem to construct an averaged estimate of network flow). Noticeably, there exist a few significant deviations from real values, within the posterior estimates for high-priority service rates in PS stations. This is likely due to a combination of sampling variance, high model complexity, and the limitations of such approximate variational procedures for transient analyses of stochastic processes.
$\mathbb{Q}$
, along with point estimates obtained by adapting benchmark variational techniques in [Reference Opper and Sanguinetti18]. We observe that the proposed framework allows us to gain a good overview of the system properties and variability in the processing speed across the various stations; by contrast, existing methods are far from offering reasonable approximations to system behaviour (they instead seem to construct an averaged estimate of network flow). Noticeably, there exist a few significant deviations from real values, within the posterior estimates for high-priority service rates in PS stations. This is likely due to a combination of sampling variance, high model complexity, and the limitations of such approximate variational procedures for transient analyses of stochastic processes.
7. Discussion
In this paper, we have enabled the variational evaluation of approximating measures for partially-observed coupled systems of jump stochastic processes, with a focus on mixed systems of QNs. Furthermore, we have presented a flexible approximate Bayesian framework, capable of overcoming the challenges posed by coupling properties, and applicable in scenarios where existing MCMC or variational solutions are unusable. To achieve this goal, we have built on existing variational theory (see [Reference Opper and Sanguinetti18, Reference Cohn, El-Hay, Friedman and Kupferman7]) and discussed an alternate optimization procedure with slack variables and inequality constraints that can address computational limitations within existing techniques. Notably, results within this paper contribute to existing Bayesian statistical literature in [Reference Sutton and Jordan27, Reference Wang, Casale and Sutton29, Reference Perez, Hodge and Kypraios22] and allow for the study of the latent stochastic behaviour across complex mixed network models, by means of an augmented process for interactions in the resources.
Even though the proposed framework relies on an approximated network model as a basis for inference (which ensures the absolute continuity across base measures), and while it further analyses queue lengths by means of augmented job transitions in the resources, we have shown that we can reliably capture the finite-dimensional posterior distributions of the various marginal stochastic processes and offer a good overview of the network structure and likely flow of workload. This is important as it can enable evaluation and uncertainty quantification tasks in several networked systems, found in many application domains, where full data observations may be hard to retrieve. Current state-of-the-art alternatives rely on strong assumptions leading to stationary analyses of such systems, or use alternate MCMC procedures that reportedly encounter limitations due to existing computational constraints [Reference Sutton and Jordan27, Reference Perez, Hodge and Kypraios22].
Appendix A. Construction
Let 
 $(\Omega,\mathcal{F})$
 be a measurable space with the regular conditional probability property; also, let
$(\Omega,\mathcal{F})$
 be a measurable space with the regular conditional probability property; also, let 
 $0\leq t_1 < \dots < t_K \leq T$
 be some fixed observation times, with
$0\leq t_1 < \dots < t_K \leq T$
 be some fixed observation times, with 
 $T>0$
. In a standard QN with M stations,
$T>0$
. In a standard QN with M stations, 
 $\Omega$
 may denote a product set supporting instantaneous rates, trajectories, and observations, and
$\Omega$
 may denote a product set supporting instantaneous rates, trajectories, and observations, and 
 $\mathcal{F}$
 the corresponding product
$\mathcal{F}$
 the corresponding product 
 $\sigma$
-algebra. The space of rates and observations will consist of trivial Borel algebras and power sets, so that
$\sigma$
-algebra. The space of rates and observations will consist of trivial Borel algebras and power sets, so that 
 $\boldsymbol{\lambda}$
 is an
$\boldsymbol{\lambda}$
 is an 
 $(\mathbb{R}_+^n,\mathcal{B}(\mathbb{R}_+^n))$
-valued random variable of rates in the infinitesimal generator matrix Q of X, where
$(\mathbb{R}_+^n,\mathcal{B}(\mathbb{R}_+^n))$
-valued random variable of rates in the infinitesimal generator matrix Q of X, where 
 $n\in\mathbb{N}$
 denotes an arbitrary number determined by the network topology. In addition,
$n\in\mathbb{N}$
 denotes an arbitrary number determined by the network topology. In addition, 
 $\{O_k : k=1,\dots,K\}$
 corresponds to random measurement variables for the network monitoring activity, each defined on
$\{O_k : k=1,\dots,K\}$
 corresponds to random measurement variables for the network monitoring activity, each defined on 
 $(\mathcal{O},\mathcal{P}(\mathcal{O}))$
, where
$(\mathcal{O},\mathcal{P}(\mathcal{O}))$
, where 
 $\mathcal{O}$
 denotes an arbitrary countable support set for observations in every service station. A network trajectory
$\mathcal{O}$
 denotes an arbitrary countable support set for observations in every service station. A network trajectory 
 $X=(X_t)_{0\leq t\leq T}$
 is an
$X=(X_t)_{0\leq t\leq T}$
 is an 
 $(\mathcal{S},\mathcal{P}(\mathcal{S}))$
-valued stochastic process with a countably infinite support set
$(\mathcal{S},\mathcal{P}(\mathcal{S}))$
-valued stochastic process with a countably infinite support set 
 $\mathcal{S}$
. Note that this is a piecewise deterministic jump process, so that
$\mathcal{S}$
. Note that this is a piecewise deterministic jump process, so that 
 $X=(\textbf{t},\textbf{x})$
 is formed by a sequence of transition times
$X=(\textbf{t},\textbf{x})$
 is formed by a sequence of transition times 
 $\textbf{t}$
 along with states
$\textbf{t}$
 along with states 
 $\textbf{x}$
. Every pair
$\textbf{x}$
. Every pair 
 $(\textbf{t},\textbf{x})$
 can be further defined as a random variable on a measurable space
$(\textbf{t},\textbf{x})$
 can be further defined as a random variable on a measurable space 
 $(\mathcal{X},\Sigma_\mathcal{X})$
, with
$(\mathcal{X},\Sigma_\mathcal{X})$
, with 
 $\mathcal{X}=\cup_{i=0}^\infty ([0,T]\times\mathcal{S})^i$
 and
$\mathcal{X}=\cup_{i=0}^\infty ([0,T]\times\mathcal{S})^i$
 and 
 $\Sigma_\mathcal{X}$
 the corresponding union
$\Sigma_\mathcal{X}$
 the corresponding union 
 $\sigma$
-algebra. This space can support all finite
$\sigma$
-algebra. This space can support all finite 
 $\mathcal{S}$
-valued trajectories and allows the assignment of a dominating base measure
$\mathcal{S}$
-valued trajectories and allows the assignment of a dominating base measure 
 $\mu_\mathcal{X}$
 with respect to which to define a trajectory density. For details, we refer the reader to [Reference Daley and Vere-Jones9].
$\mu_\mathcal{X}$
 with respect to which to define a trajectory density. For details, we refer the reader to [Reference Daley and Vere-Jones9].
 Let 
 $\mathbb{P}$
 be a reference probability measure on
$\mathbb{P}$
 be a reference probability measure on 
 $(\Omega,\mathcal{F})$
. For all
$(\Omega,\mathcal{F})$
. For all 
 $A\in\mathcal{B}(\mathbb{R}^n_+)$
, we write
$A\in\mathcal{B}(\mathbb{R}^n_+)$
, we write 
 \begin{equation*}\mathbb{P}(\boldsymbol{\lambda}^{-1}(A))=\mathbb{P}_{\boldsymbol{\lambda}}(A) =\int_{A}f_{\boldsymbol{\lambda}}(\textbf{a})\,\mu_{\mathbb{R}_+^n}(d\textbf{a}),\end{equation*}
\begin{equation*}\mathbb{P}(\boldsymbol{\lambda}^{-1}(A))=\mathbb{P}_{\boldsymbol{\lambda}}(A) =\int_{A}f_{\boldsymbol{\lambda}}(\textbf{a})\,\mu_{\mathbb{R}_+^n}(d\textbf{a}),\end{equation*}
where 
 $f_{\boldsymbol{\lambda}}$
 denotes the joint density function of n independent gamma-distributed variables. Hence, we assume that the distribution of instantaneous rates under
$f_{\boldsymbol{\lambda}}$
 denotes the joint density function of n independent gamma-distributed variables. Hence, we assume that the distribution of instantaneous rates under 
 $\mathbb{P}$
 admits a density carried by the (Lebesgue) measure
$\mathbb{P}$
 admits a density carried by the (Lebesgue) measure 
 $\mu_{\mathbb{R}_+^n}$
. Next, let
$\mu_{\mathbb{R}_+^n}$
. Next, let 
 $\kappa_1:\mathcal{F}\times\mathbb{R}^n_+\rightarrow[0,1]$
 be a regular conditional probability, i.e., a Markov kernel that defines a probability measure on
$\kappa_1:\mathcal{F}\times\mathbb{R}^n_+\rightarrow[0,1]$
 be a regular conditional probability, i.e., a Markov kernel that defines a probability measure on 
 $\mathcal{F}$
 for all
$\mathcal{F}$
 for all 
 $\boldsymbol{\lambda}\in\mathbb{R}^n_+$
, with
$\boldsymbol{\lambda}\in\mathbb{R}^n_+$
, with 
 \begin{equation*}\mathbb{P}(B\cap\boldsymbol{\lambda}^{-1}(A))=\int_A \kappa_1(B,\textbf{a}) \, f_{\boldsymbol{\lambda}}(\textbf{a})\,\mu_{\mathbb{R}_+^n}(d\textbf{a})\end{equation*}
\begin{equation*}\mathbb{P}(B\cap\boldsymbol{\lambda}^{-1}(A))=\int_A \kappa_1(B,\textbf{a}) \, f_{\boldsymbol{\lambda}}(\textbf{a})\,\mu_{\mathbb{R}_+^n}(d\textbf{a})\end{equation*}
for 
 $A\in\mathcal{B}(\mathbb{R}^n_+)$
 and
$A\in\mathcal{B}(\mathbb{R}^n_+)$
 and 
 $B\in\mathcal{F}$
. By definition,
$B\in\mathcal{F}$
. By definition, 
 $\kappa_1(B,\textbf{a})= \mathbb{P}(B|\boldsymbol{\lambda}=\textbf{a})$
, and most importantly,
$\kappa_1(B,\textbf{a})= \mathbb{P}(B|\boldsymbol{\lambda}=\textbf{a})$
, and most importantly, 
 \begin{equation*}\kappa_1(X^{-1}(C),\textbf{a}) = \int_C f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x}) \,\mu_\mathcal{X}(d\textbf{t},d\textbf{x})\end{equation*}
\begin{equation*}\kappa_1(X^{-1}(C),\textbf{a}) = \int_C f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x}) \,\mu_\mathcal{X}(d\textbf{t},d\textbf{x})\end{equation*}
for all 
 $C\in\Sigma_{\mathcal{X}}$
 (note this often constitutes an intractable integral). The conditional density
$C\in\Sigma_{\mathcal{X}}$
 (note this often constitutes an intractable integral). The conditional density 
 $f_{X|\boldsymbol{\lambda}=\textbf{a}}$
 is such that for every
$f_{X|\boldsymbol{\lambda}=\textbf{a}}$
 is such that for every 
 $I\in \mathbb{N}$
 and pair of ordered times
$I\in \mathbb{N}$
 and pair of ordered times 
 $\textbf{t}=\{0,t_1,\dots,t_I\}$
 in [0, T] and states
$\textbf{t}=\{0,t_1,\dots,t_I\}$
 in [0, T] and states 
 $\textbf{x}=\{x_0,\dots,x_I\}$
 in
$\textbf{x}=\{x_0,\dots,x_I\}$
 in 
 $\mathcal{S}$
 we have
$\mathcal{S}$
 we have 
 \begin{equation*}f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x}) = \pi(x_0)\, e^{Q_{x_{I}}(T-t_I)}\, \prod_{i=1}^I Q_{x_{i-1},x_{i}}\, e^{Q_{x_{i-1}}(t_i-t_{i-1})}, \end{equation*}
\begin{equation*}f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x}) = \pi(x_0)\, e^{Q_{x_{I}}(T-t_I)}\, \prod_{i=1}^I Q_{x_{i-1},x_{i}}\, e^{Q_{x_{i-1}}(t_i-t_{i-1})}, \end{equation*}
where 
 $Q\equiv Q(\textbf{a})$
 is the matrix of infinitesimal transition rates in X associated to values in
$Q\equiv Q(\textbf{a})$
 is the matrix of infinitesimal transition rates in X associated to values in 
 $\textbf{a}$
. Finally, network observations are assumed to be discrete events, independent of transition rates given a trajectory. Thus, there exists a kernel
$\textbf{a}$
. Finally, network observations are assumed to be discrete events, independent of transition rates given a trajectory. Thus, there exists a kernel 
 $\kappa_2:\mathcal{F}\times(\mathcal{X} \times \mathbb{R}^n_+)\rightarrow[0,1]$
 such that
$\kappa_2:\mathcal{F}\times(\mathcal{X} \times \mathbb{R}^n_+)\rightarrow[0,1]$
 such that 
 \begin{equation*}\mathbb{P}(O_k\in D | X = (\textbf{t},\textbf{x}), \boldsymbol{\lambda}=\textbf{a}) = \kappa_2(O_k^{-1}(D), (\textbf{t},\textbf{x}),\textbf{a}) = \sum_{d\in D} f_{O_k|(\textbf{t},\textbf{x})}(d) \, \mu_\mathcal{O}(d) \end{equation*}
\begin{equation*}\mathbb{P}(O_k\in D | X = (\textbf{t},\textbf{x}), \boldsymbol{\lambda}=\textbf{a}) = \kappa_2(O_k^{-1}(D), (\textbf{t},\textbf{x}),\textbf{a}) = \sum_{d\in D} f_{O_k|(\textbf{t},\textbf{x})}(d) \, \mu_\mathcal{O}(d) \end{equation*}
for all 
 $k=1,\dots,K$
 and
$k=1,\dots,K$
 and 
 $D\in\mathcal{P}(\mathcal{O})$
. Here,
$D\in\mathcal{P}(\mathcal{O})$
. Here, 
 $f_{O_k|(\textbf{t},\textbf{x})}$
 defines an arbitrary probability mass function on
$f_{O_k|(\textbf{t},\textbf{x})}$
 defines an arbitrary probability mass function on 
 $\mathcal{O}$
 carried by a counting measure; in our applications, each observation depends only on the state of the system at the observation time, so the above expression could be further simplified.
$\mathcal{O}$
 carried by a counting measure; in our applications, each observation depends only on the state of the system at the observation time, so the above expression could be further simplified.
 Under the above model construction, the support over infinitesimal rates is a standard Borel space, and the existence of a posterior distribution is guaranteed (cf. [Reference Orbanz and Teh20]). Also, measures induced by the kernel 
 $\kappa_2$
 are
$\kappa_2$
 are 
 $\sigma$
-finite and such that
$\sigma$
-finite and such that 
 $\kappa_2(\cdot,(\textbf{t},\textbf{x}),\textbf{a})<<\mu_{\mathcal{O}}$
 for every
$\kappa_2(\cdot,(\textbf{t},\textbf{x}),\textbf{a})<<\mu_{\mathcal{O}}$
 for every 
 $((\textbf{t},\textbf{x}),\textbf{a})\in\mathcal{X}\times\mathbb{R}^n_+$
. The posterior is thus carried by its corresponding prior and defined by means of the Radon–Nikodym derivative
$((\textbf{t},\textbf{x}),\textbf{a})\in\mathcal{X}\times\mathbb{R}^n_+$
. The posterior is thus carried by its corresponding prior and defined by means of the Radon–Nikodym derivative 
 \begin{equation*}\frac{d\mathbb{P}_{\boldsymbol{\lambda}|O_1=o_1,\dots,O_K=o_K}}{d\mathbb{P}_{\boldsymbol{\lambda}}}(\textbf{a}) = \frac{\int_{\mathcal{X}}\prod_{k=1}^K f_{O_k|(\textbf{t},\textbf{x})}(o_k)\, f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})}{\int_{\mathbb{R}^n_+}\int_{\mathcal{X}}\prod_{k=1}^K f_{O_k|(\textbf{t},\textbf{x})}(o_k)\, f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})\, f_{\boldsymbol{\lambda}}(\textbf{a})\mu_{\mathbb{R}^n_+}(d\textbf{a}) } ,\end{equation*}
\begin{equation*}\frac{d\mathbb{P}_{\boldsymbol{\lambda}|O_1=o_1,\dots,O_K=o_K}}{d\mathbb{P}_{\boldsymbol{\lambda}}}(\textbf{a}) = \frac{\int_{\mathcal{X}}\prod_{k=1}^K f_{O_k|(\textbf{t},\textbf{x})}(o_k)\, f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})}{\int_{\mathbb{R}^n_+}\int_{\mathcal{X}}\prod_{k=1}^K f_{O_k|(\textbf{t},\textbf{x})}(o_k)\, f_{X|\boldsymbol{\lambda}=\textbf{a}}(\textbf{t},\textbf{x})\, \mu_{\mathcal{X}}(d\textbf{t},d\textbf{x})\, f_{\boldsymbol{\lambda}}(\textbf{a})\mu_{\mathbb{R}^n_+}(d\textbf{a}) } ,\end{equation*}
where we employ the shorthand notation 
 $d\mathbb{P}_{\boldsymbol{\lambda}|\cdot}(\textbf{a}) = \mathbb{P}_{\boldsymbol{\lambda}}(d\textbf{a}|\cdot)$
.
$d\mathbb{P}_{\boldsymbol{\lambda}|\cdot}(\textbf{a}) = \mathbb{P}_{\boldsymbol{\lambda}}(d\textbf{a}|\cdot)$
.
Acknowledgements
The authors would like to thank the anonymous reviewers, who greatly improved the quality of this paper. This work was supported by RCUK through the Horizon Digital Economy Research grants EP/G065802/1 and EP/M000877/1.
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 












