# Linear quadratic optimal control of conditional McKean-Vlasov equation with random coefficients and applications

## Abstract

We consider the optimal control problem for a linear conditional McKean-Vlasov equation with quadratic cost functional. The coefficients of the system and the weighting matrices in the cost functional are allowed to be adapted processes with respect to the common noise filtration. Semi closed-loop strategies are introduced, and following the dynamic programming approach in (Pham and Wei, Dynamic programming for optimal control of stochastic McKean-Vlasov dynamics, 2016), we solve the problem and characterize time-consistent optimal control by means of a system of decoupled backward stochastic Riccati differential equations. We present several financial applications with explicit solutions, and revisit, in particular, optimal tracking problems with price impact, and the conditional mean-variance portfolio selection in an incomplete market model.

## Introduction and problem formulation

Let us formulate the linear quadratic optimal control of conditional (also called stochastic) McKean-Vlasov equation with random coefficients (LQCMKV in short form). Consider the controlled stochastic McKean-Vlasov dynamics in $$\mathbb {R}^{d}$$ given by

$$\begin{array}{@{}rcl@{}} {dX}_{t} &=& b_{t}\left(X_{t},\mathbb{E}[X_{t}|W^{0}],\alpha_{t}\right) dt + \sigma_{t}\left(X_{t},\mathbb{E}[X_{t}|W^{0}],\alpha_{t}\right) {dW}_{t} \\ & & \;\;\;\;\; + \; {\sigma^{0}_{t}}\left(X_{t},\mathbb{E}[X_{t}|W^{0}],\alpha_{t}\right) d{W^{0}_{t}}, \;\;\; 0 \leq t \leq T, \;\;\; X_{0} \; = \; \xi_{0}. \end{array}$$
(1)

Here W,W 0 are two independent one-dimensional Brownian motions on some probability space $$(\Omega,{\mathcal {F}},\mathbb {P}), \mathbb {F}^{0} = ({{\mathcal {F}}_{t}^{0}})_{0\leq t \leq T}$$ is the natural filtration generated by $$W^{0}, \mathbb {F} = ({\mathcal {F}}_{t})_{0\leq t\leq T}$$ is the natural filtration generated by (W,W 0), augmented with an independent σ-algebra $${\mathcal {G}}$$, $$\xi _{0} \in L^{2}({\mathcal {G}};\mathbb {R}^{d})$$ is a square-integrable $${\mathcal {G}}$$-measurable random variable with values in $$\mathbb {R}^{d}, \mathbb {E}[X_{t}|W^{0}]$$ denotes the conditional expectation of X t given the whole σ-algebra $${{\mathcal {F}}_{T}^{0}}$$ of W 0, and the control process α is an $$\mathbb {F}^{0}$$-progressively measurable process with values in A equal either to $$\mathbb {R}^{m}$$ or to $$L(\mathbb {R}^{d};\mathbb {R}^{m})$$ the set of Lipschitz functions from $$\mathbb {R}^{d}$$ into $$\mathbb {R}^{m}$$. This distinction of the control sets will be discussed later in the introduction, but for the moment, one may interpret roughly the case when A $$= \mathbb {R}^{m}$$ as the modeling for open-loop control and the case when A $$= L(\mathbb {R}^{d};\mathbb {R}^{m})$$ as the modeling for closed-loop control. When $$A = \mathbb {R}^{m}$$, we require that α satisfies the square-integrability condition L 2(Ω×[0,T]), i.e., $$\mathbb {E}\left [\int _{0}^{T} |\alpha _{t}|^{2} dt\right ] < \infty$$, and we denote by $${\mathcal {A}}$$ the set of control processes. The coefficients $$b_{t}(x,\bar x,a), \sigma _{t}(x,\bar x,a), {\sigma _{t}^{0}}(x,\bar x,a), 0\leq t\leq T$$, are $$\mathbb {F}^{0}$$-adapted processes with values in $$\mathbb {R}^{d}$$, for any $$x,\bar x \in \mathbb {R}^{d}$$, a A, and of linear form:

$$\begin{array}{ccc} b_{t}(x,\bar x,a) &=&\left\{ \begin{array}{ll} {b_{t}^{0}} + B_{t} x + \bar B_{t} \bar x + C_{t} a & \text{if} \, A = \mathbb{R}^{m} \\ {b_{t}^{0}} + B_{t} x + \bar B_{t} \bar x + C_{t} a(x) & \text{if}\, A = L(\mathbb{R}^{d};\mathbb{R}^{m}) \end{array} \right. \\ \sigma_{t}(x,\bar x,a) &=&\left\{ \begin{array}{ll} \gamma_{t} + D_{t} x + \bar D_{t} \bar x + F_{t} a & \text{if}\, A = \mathbb{R}^{m} \\ \gamma_{t} + D_{t} x + \bar D_{t} \bar x + F_{t} a(x) & \text{if}\, A = L(\mathbb{R}^{d};\mathbb{R}^{m}) \end{array} \right. \\ {\sigma_{t}^{0}}(x,\bar x,a) &=&\left\{ \begin{array}{ll} {\gamma_{t}^{0}} + {D_{t}^{0}} x + \bar {D_{t}^{0}} \bar x + {F_{t}^{0}} a & \text{if}\, A = \mathbb{R}^{m} \\ {\gamma_{t}^{0}} + {D_{t}^{0}} x + \bar {D_{t}^{0}} \bar x + {F_{t}^{0}} a(x) & \text{if}\, A = L(\mathbb{R}^{d};\mathbb{R}^{m}), \end{array} \right. \end{array}$$
(2)

where b 0,γ,γ 0 are $$\mathbb {F}^{0}$$-adapted processes vector-valued in $$\mathbb {R}^{d}$$, satisfying a square-integrability condition L 2(Ω×[0,T]): $$\mathbb {E}\left [\int _{0}^{T} |b_{t}|^{2} + |{b_{t}^{0}}|^{2} + |\gamma _{t}|^{2} + |{\gamma _{t}^{0}}|^{2} dt\right ] < \infty$$, B, $$\bar B$$, D, $$\bar D, D^{0}, \bar D^{0}$$ are essentially bounded $$\mathbb {F}^{0}$$-adapted processes matrix-valued in $$\mathbb {R}^{d\times d}$$, and C, F, F 0 are essentially bounded $$\mathbb {F}^{0}$$-adapted processes matrix-valued in $$\mathbb {R}^{d\times m}$$. For any $$\alpha \in {\mathcal {A}}$$, there exists a unique strong solution X =X α to (1), which is $$\mathbb {F}$$-adapted, and satisfies the square-integrability condition $${\mathcal {S}}^{2}(\Omega \times [0,T])$$:

$$\begin{array}{@{}rcl@{}} \mathbb{E}\left[ \sup_{0\leq t\leq T} |X_{s}^{\alpha}|^{2}\right] & \leq & C_{\alpha} \left(1 + \mathbb{E}|\xi_{0}|^{2} \right) \; < \; \infty, \end{array}$$
(3)

for some positive constant C α depending on α: when A $$= \mathbb {R}^{m}, C_{\alpha }$$ depends on α via $$\mathbb {E}\left [\int _{0}^{T} |\alpha _{t}|^{2} dt\right ] < \infty$$, and when A $$= L\left (\mathbb {R}^{d};\mathbb {R}^{m}\right), C_{\alpha }$$ depends on α via its Lipschitz constant.

The cost functional to be minimized over $$\alpha \in {\mathcal {A}}$$ is:

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} f_{t}\left(X_{t}^{\alpha}, \mathbb{E}\left[X_{t}^{\alpha}|W^{0}\right],\alpha_{t}\right) dt + g\left(X_{T}^{\alpha},\mathbb{E}\left[X_{T}^{\alpha}|W^{0}\right]\right) \right], \\ \rightarrow \;\;\; V_{0} & := & \inf_{\alpha\in{\mathcal{A}}} J(\alpha), \end{array}$$

where $$\{f_{t}(x,\bar x,a), 0\leq t\leq T\}$$, is an $$\mathbb {F}^{0}$$-adapted real-valued process, $$g(x,\bar x)$$ is a $${{\mathcal {F}}_{T}^{0}}$$-measurable random variable, for any $$x,\bar x \in \mathbb {R}^{d}$$, a A, of quadratic form:

$$\begin{array}{ccl} f_{t}(x,\bar x,a) &=& \left\{ \begin{array}{ll} x^{\scriptscriptstyle{\intercal}} Q_{t} x + \bar x^{\scriptscriptstyle{\intercal}} \bar Q_{t} \bar x + M_{t}^{\scriptscriptstyle{\intercal}} x + a^{\scriptscriptstyle{\intercal}} N_{t} a & \text{if}\, A = \mathbb{R}^{m} \\ x^{\scriptscriptstyle{\intercal}} Q_{t} x + \bar x^{\scriptscriptstyle{\intercal}} \bar Q_{t} \bar x + M_{t}^{\scriptscriptstyle{\intercal}} x + a(x)^{\scriptscriptstyle{\intercal}} N_{t} a(x) & \text{if}\, A = L(\mathbb{R}^{d};\mathbb{R}^{m}) \end{array} \right. \\ g(x,\bar x) &=& x^{\scriptscriptstyle{\intercal}} P x + \bar x^{\scriptscriptstyle{\intercal}} \bar P \bar x + L^{\scriptscriptstyle{\intercal}} x, \end{array}$$
(4)

where Q, $$\bar Q$$ are essentially bounded $$\mathbb {F}^{0}$$-adapted processes, with values in $$\mathbb {S}^{d}$$ the set of symmetric matrices in $$\mathbb {R}^{d\times d}$$, P, $$\bar P$$ are essentially bounded $${{\mathcal {F}}_{T}^{0}}$$-measurable random matrices in $$\mathbb {S}^{d}$$, N is an essentially bounded $$\mathbb {F}^{0}$$-adapted process, with values in $$\mathbb {S}^{m}$$, M is an $$\mathbb {F}^{0}$$-adapted process with values in $$\mathbb {R}^{d}$$, satisfying a square integrability condition L 2(Ω×[0,T]), L is an $${{\mathcal {F}}_{T}^{0}}$$-measurable square integrable random vector in $$\mathbb {R}^{d}$$, and $$^{\scriptscriptstyle {\intercal }}$$ denotes the transpose of any vector or matrix.

The above control formulation of stochastic McKean-Vlasov equations provides a unified framework for some important classes of control problems. In particular, it is motivated in particular by the asymptotic formulation of cooperative equilibrium for a large population of particles (players) in mean-field interaction under common noise (see, e.g., (Carmona and Zhu 2016; Carmona et al. 2013)) and also occurs when the cost functional involves the first and second moment of the (conditional) law of the state process, for example in (conditional) mean-variance portfolio selection problem (see, e.g., (Basak and Chabakauri 2010; Borkar and Kumar 2010; Li and Zhou 2000)). When A $$= L(\mathbb {R}^{d};\mathbb {R}^{m})$$, this corresponds to the problem of a (representative) agent, using a control α based on her/his current private state X t at time t, and of the information brought by the common noise $${{\mathcal {F}}_{T}^{0}}$$, typically the conditional mean $$\mathbb {E}[X_{t}|W^{0}]$$, which represents, in the large population equilibrium interpretation, the limit of the empirical mean of the state of all the players when their number tend to infinity from the propagation of chaos. In other words, the control α may be viewed as a semi closed-loop control, i.e., closed-sloop w.r.t. the state process, and open-loop w.r.t. the common noise W 0, or alternatively as a $$\mathbb {F}^{0}$$-progressively measurable random field control $$\alpha = \left \{\alpha _{t}(x), 0 \leq t \leq T, x \in \mathbb {R}^{d}\right \}$$. This class of semi closed-loop control extends the class of closed-loop strategies for the LQ control of McKean-Vlasov equations (or mean-field stochastic differential equations) without common noise W 0, as recently studied in (Li et al. 2016) where the controls are chosen at any time t in linear form w.r.t. the current state value X t and the deterministic expected value $$\mathbb {E}[X_{t}]$$. When A $$= \mathbb {R}^{m}$$, the LQCMKV problem may be viewed as a special partial observation control problem for a state dynamics like in 1 where the controls are of open-loop form, and adapted w.r.t. an observation filtration $$\mathbb {F}^{I} = \mathbb {F}^{0}$$ generated by some exogenous random factor process I driven by W 0. In the case where σ= 0, we see that the process X is $$\mathbb {F}^{0}$$-adapted, hence $$\mathbb {E}[X_{t}|W^{0}] = X_{t}$$, and the LQCMKV problem is reduced to the classical LQ control problem (see, e.g., (Yong and Zhou 1999)) with random coefficients, with open-loop controls for A $$= \mathbb {R}^{m}$$ or closed-loop controls for A $$= L(\mathbb {R}^{d};\mathbb {R}^{m})$$. Note that this distinction between open-loop and closed-loop strategies for LQ control problems has been recently introduced in (Sun and Yong 2014) where closed-loop controls are assumed of linear form w.r.t. the current state value, while it is considered here a priori only Lipschitz w.r.t. the current state value.

Optimal control of McKean-Vlasov equation is a rather new topic in the area of stochastic control and applied probability, and addressed, e.g., in (Andersson and Djehiche 2010; Bensoussan et al. 2013; Buckdahn et al. 2011; Carmona and Delarue 2015; Pham and Wei 2015). In this McKean-Vlasov context, the class of linear quadratic optimal control, which provides a typical case for solvable applications, has been studied in several papers, among them (Hu et al. 2012; Huang et al. 2015; Sun 2015; Yong 2013) where the coefficients are assumed to be deterministic. It is often argued that due to the presence of the law of the state in a nonlinear way (here for the LQ problem, the square of the expectation), the problem is time-inconsistent in the sense that an optimal control viewed from today is no more optimal when viewed from tomorrow, and this would prevent a priori the use of the dynamic programming method. To tackle time inconsistency, one then focuses typically on either pre-commitment strategies, i.e. controls that are optimal for the problem viewed at the initial time, but may be not optimal at future date, or game-equilibrium strategies, i.e., control decisions considered as a game against all the future decisions the controller is going to make.

In this paper, we shall focus on the optimal control for the initial value V 0 of the LQCMKV problem with random coefficients, but following the approach developed in (Pham and Wei 2016), we emphasize that time consistency can be actually restored for pre-commitment strategies, provided that one considers as state variable the conditional law of the state process instead of the state itself, therefore making possible the use of the dynamic programming method. We show that the dynamic version of the LQCMKV control problem defined by a random field value function, has a quadratic structure with respect to the conditional law of the state process, leading to a characterization of the optimal control in terms of a decoupled system of backward stochastic Riccati equations (BSREs) whose existence and uniqueness are obtained in connection with a standard LQ control problem. The main ingredient for such derivation is an Itô’s formula along a flow of conditional measures and a suitable notion of differentiability with respect to probability measures. We illustrate our results with several financial applications. We first revisit the optimal trading and benchmark tracking problem with price impact for general price and target processes, and obtain closed-form solutions extending some known results in the literature. We next solve a variation of the mean-variance portfolio selection problem in an incomplete market with random factor. Our last example considers an interbank systemic risk model with random factor in a common noise environment.

The paper is organized as follows. “Preliminaries” section gives some key preliminaries: we reformulate the LQCMKV problem into a problem involving the conditional law of the state process as state variable for which a dynamic programming verification theorem is stated and time consistency holds. We also recall the Itô’s formula along a flow of conditional measures. “Backward stochastic Riccati equations” section is devoted to the characterization of the optimal control by means of a system of BSREs in the case of both a control set A $$= \mathbb {R}^{m}$$ and A $$= L(\mathbb {R}^{d};\mathbb {R}^{m})$$. We develop in “Applications” section the applications.

We end this introduction with some notations.

Notations. We denote by $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$ the set probability measures μ on $$\mathbb {R}^{d}$$, which are square integrable, i.e., $$\|\mu \|_{_{2}}^{2} := \int _{\mathbb {R}^{d}} |x|^{2} \mu (dx) < \infty$$. For any $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, we denote by $$L_{\mu }^{2}(\mathbb {R}^{q})$$ the set of measurable functions $$\varphi : \mathbb {R}^{d} \rightarrow \mathbb {R}^{q}$$, which are square integrable with respect to μ, by $$L_{\mu \otimes \mu }^{2}(\mathbb {R}^{q})$$ the set of measurable functions $$\psi : \mathbb {R}^{d}\times \mathbb {R}^{d} \rightarrow \mathbb {R}^{q}$$, which are square integrable with respect to the product measure μμ, and we set

$$\begin{array}{@{}rcl@{}} \mu(\varphi) := \int \varphi(x)\, \mu(dx), \, \bar\mu \, := \, \int \! x \mu(dx), & & \!\! \mu\otimes\mu(\psi) \, := \,\! \int \!\psi(x,x') \mu(dx)\mu(dx'). \end{array}$$

We also define $$L_{\mu }^{\infty }(\mathbb {R}^{q})$$ (resp. $$L_{\mu \otimes \mu }^{\infty }(\mathbb {R}^{q})$$) as the subset of elements $$\varphi \in L_{\mu }^{2}(\mathbb {R}^{q})$$ (resp. $$L_{\mu \otimes \mu }^{2}(\mathbb {R}^{q})$$) which are bounded μ (resp. μμ) a.e., and φ is their essential supremum. For any random variable X on $$(\Omega,{\mathcal {F}},\mathbb {P})$$, we denote by $${\mathcal {L}}(X)$$ its probability law (or distribution) under $$\mathbb {P}$$, by $${\mathcal {L}}(X|W^{0})$$ its conditional law given $${{\mathcal {F}}_{T}^{0}}$$, and we shall assume w.l.o.g. that $${\mathcal {G}}$$ is rich enough in the sense that $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d}) = \left \{{\mathcal {L}}(\xi): \xi \in L^{2}({\mathcal {G}};\mathbb {R}^{d})\right \}$$.

## Preliminaries

For any $$\alpha \in {\mathcal {A}}$$, and $$X^{\alpha } = (X_{t}^{\alpha })_{0\leq t\leq T}$$ the solution to (1), we define $$\rho _{t}^{\alpha } = {\mathcal {L}}(X_{t}^{\alpha }|W^{0})$$ as the conditional law of $$X_{t}^{\alpha }$$ given $${{\mathcal {F}}_{T}^{0}}$$ for 0≤tT. Since X α is $$\mathbb {F}$$-adapted, and W 0 is a $$(\mathbb {P},\mathbb {F})$$-Wiener process, we notice that $$\rho _{t}^{\alpha }(dx) = \mathbb {P}\left [X_{t}^{\alpha } \in dx| {{\mathcal {F}}_{T}^{0}}\right ] = \mathbb {P}\left [X_{t}^{\alpha } \in dx| {{\mathcal {F}}_{t}^{0}}\right ]$$, and thus $$\left \{\rho _{t}^{\alpha },0\leq t\leq T \right \}$$ admits an $$\mathbb {F}^{0}$$-progressively measurable modification (see, e.g., Theorem 2.24 in (Bain and Crisan 2009)), that will be identified with itself in the sequel, and is valued in $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$ by (3), namely:

$$\begin{array}{@{}rcl@{}} \mathbb{E} \left[ \sup_{0\leq t\leq T} \|\rho_{t}^{\alpha}\|_{_{2}}^{2} \right] & \leq & C_{\alpha} \left(1 + \mathbb{E}|\xi_{0}|^{2} \right). \end{array}$$
(5)

Moreover, we mention that the process $$\rho ^{\alpha } = (\rho _{t}^{\alpha })_{0\leq t\leq T}$$ has continuous trajectories as it is valued in $${\mathcal {P}}_{_{2}}(C([0,T];\mathbb {R}^{d})$$ the set of square integrable probability measures on the space $$C([0,T];\mathbb {R}^{d})$$ of continuous functions from [0,T] into $$\mathbb {R}^{d}$$.

Now, by the law of iterated conditional expectations, and recalling that $$\alpha \in {\mathcal {A}}$$ is $$\mathbb {F}^{0}$$-progressively measurable, we can rewrite the cost functional as

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \mathbb{E} \left[ f_{t}\left(X_{t}^{\alpha}, \bar\rho_{t}^{\alpha},\alpha_{t} \right) \left| {{\mathcal{F}}_{t}^{0}}\right. \right] dt + \mathbb{E} \left[ \left. g\left(X_{T}^{\alpha},\bar\rho_{T}^{\alpha}\right) \right| {{\mathcal{F}}_{T}^{0}} \right] \right] \\ &=& \mathbb{E} \left[ {\int_{0}^{T}} \rho_{t}^{\alpha} \left(f_{t}\left(.,\bar\rho_{t}^{\alpha},\alpha_{t}\right)\right) dt + \rho_{T}^{\alpha}\left(g\left(.,\bar\rho_{T}^{\alpha}\right)\right) \right] \\ &=& \mathbb{E} \left[ {\int_{0}^{T}} \hat{f}_{t}\left(\rho_{t}^{\alpha},\alpha_{t}\right) dt + \hat{g}\left(\rho_{T}^{\alpha}\right) \right], \end{array}$$
(6)

where we used in the second equality the fact that $$\{f_{t}(x,\bar {x},a), x,\bar {x} \in \mathbb {R}^{d}, a \in A, 0\leq t\leq T\}$$, is a random field $$\mathbb {F}^{0}$$-adapted process, g(x) is $${{\mathcal {F}}_{T}^{0}}$$-measurable, and the $$\mathbb {F}^{0}$$-adapted process $$\left \{\hat {f}_{t}(\mu,a),0\leq t\leq T\right \}$$, the $${{\mathcal {F}}_{T}^{0}}$$-measurable random variable $$\hat {g}(\mu)$$, for $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, a A, are defined by

$$\left\{ \begin{array}{ccc} \hat{f}_{t}(\mu,a) &:=& \mu\left(f_{t}(.,\bar\mu,a)\right) \; = \; \int f_{t}(x,\bar \mu,a) \mu(dx) \\ \hat{g}(\mu) &:=& \mu\left(g(.,\bar\mu) \right) \; = \; \int g(x,\bar \mu) \mu(dx). \end{array} \right.$$

From the quadratic forms of f,g in (4), the random fields $$\hat {f}_{t}(\mu,a)$$ and $$\hat {g}(\mu), (t,\mu,a) \in [0,T]\times {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times A$$, are given by

$$\begin{array}{ccll} \hat{f}_{t}(\mu,a) &=& \left\{ \begin{array}{ll} \text{Var}(\mu,Q_{t}) + v_{2}(\mu,Q_{t} + \bar Q_{t}) & \\ \; + \; v_{1}(\mu,M_{t}) + a^{\scriptscriptstyle{\intercal}} N_{t} a & \text{if}\, A = \mathbb{R}^{m} \\ \text{Var}(\mu,Q_{t}) + v_{2}(\mu,Q_{t} + \bar Q_{t}) & \\ \; + \; v_{1}(\mu,M_{t}) + \int [a(x)^{\scriptscriptstyle{\intercal}} N_{t} a(x) ] \mu(dx) & \text{if} \,A = L\left(\mathbb{R}^{d};\mathbb{R}^{m}\right), \\ \end{array} \right. \\ \hat{g}(\mu) &=& \text{Var}(\mu,P) + v_{2}(\mu,P+\bar P) + v_{1}(\mu,L), & \end{array}$$
(7)

where we define the functions on $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times \mathbb {S}^{d}$$ and $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times \mathbb {R}^{d}$$ by:

$$\begin{array}{@{}rcl@{}} \text{Var}(\mu,k) &:=& \int (x-\bar\mu)^{\scriptscriptstyle{\intercal}} k (x-\bar\mu) \mu(dx), \;\;\; \mu \in {\mathcal{P}}_{_{2}}\left(\mathbb{R}^{d}\right), \; k \in \mathbb{S}^{d}, \\ v_{2}(\mu,\ell) &:=& \bar\mu^{\scriptscriptstyle{\intercal}} \ell \bar\mu, \;\;\; \mu \in {\mathcal{P}}_{_{2}}\left(\mathbb{R}^{d}\right), \; \ell \in \mathbb{S}^{d} \\ v_{1}(\mu,y) &:=& y^{\scriptscriptstyle{\intercal}}\bar\mu, \;\;\;\;\;\;\; \mu \in {\mathcal{P}}_{_{2}}\left(\mathbb{R}^{d}\right), \; y \in \mathbb{R}^{d}. \end{array}$$

We shall make the following assumptions on the coefficients of the model:

(H1) Q, $$Q+\bar Q$$, P, $$P+\bar P$$, N are nonnegative a.s.;

(H2) One of the two following conditions holds:

1. (i)

N is uniformly positive definite i.e. N t δ I m ,0≤tT, a.s. for some δ> 0;

2. (ii)

P or Q is uniformly positive definite, and F is uniformly nondegenerate, i.e. |F t |≥δ,0≤tT, a.s., for some δ> 0.

Let us define the dynamic formulation of the stochastic McKean-Vlasov control problem. For any t $$\in [0,T], \xi \in L^{2}\left ({\mathcal {G}};\mathbb {R}^{d}\right)$$, and $$\alpha \in {\mathcal {A}}$$, there exists a unique strong solution, denoted by $$\{X_{s}^{t,\xi,\alpha },t\leq s\leq T\}$$, to the Eq. 1 starting from ξ at time t, and by noting that X t,ξ,α is also unique in law, we see that the conditional law of $$X_{s}^{t,\xi,\alpha }$$ given $${{\mathcal {F}}_{T}^{0}}$$ depends on ξ only through its law $${\mathcal {L}}(\xi) = {\mathcal {L}}\left (\xi |W^{0}\right)$$ (recall that $${\mathcal {G}}$$ is independent of W 0). Then, recalling also that $${\mathcal {G}}$$ is rich enough, the relation

$$\begin{array}{@{}rcl@{}} \rho_{s}^{t,\mu,\alpha} &:=& {\mathcal{L}}\left(X_{s}^{t,\xi,\alpha} | W^{0}\right), \;\;\; t \leq s \leq T, \; \mu = {\mathcal{L}}(\xi), \end{array}$$

defines for any t $$\in [0,T], \mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, and $$\alpha \in {\mathcal {A}}$$, an $$\mathbb {F}^{0}$$-progressively measurable continuous process (up to a modification) $$\left \{\rho _{s}^{t,\mu,\alpha },t\leq s\leq T\right \}$$, with values in $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, and as a consequence of the pathwise uniqueness of the solution $$\{X_{s}^{t,\xi,\alpha },t\leq s\leq T\}$$, we have the flow property for the conditional law (see Lemma 3.1 in (Pham and Wei 2016) for details):

$$\begin{array}{@{}rcl@{}} \rho_{s}^{\alpha} &=& \rho_{s}^{t,\rho_{t}^{\alpha},\alpha}, \;\;\; t \leq s \leq T, \; \alpha \in {\mathcal{A}}. \end{array}$$
(8)

We then consider the conditional cost functional

$$\begin{array}{@{}rcl@{}} J_{t}(\mu,\alpha)\! & =\! & \mathbb{E}\left[ {\int_{t}^{T}}\! \hat{f}_{s}\left(\rho_{s}^{t,\mu,\alpha},\alpha_{s}\right)\! ds \,+\, \hat{g}\!\left(\!\rho_{T}^{t,\mu,\alpha}\! \right)\! \left| {{\mathcal{F}}_{t}^{0}} \!\right.\right], \;\; t \! \in \![0,T],\! \mu\! \in\! {\mathcal{P}}_{_{2}}(\mathbb{R}^{d}), \alpha \!\in\! {\mathcal{A}}, \end{array}$$

which is well-defined by (5) and under the boundedness assumptions on the weighting matrices of the quadratic cost function. We next define the $$\mathbb {F}^{0}$$-adapted random field value function

$$\begin{array}{@{}rcl@{}} v_{t}(\mu) &=& \underset{\alpha\in{\mathcal{A}}}{\text{ess}\inf}\, J_{t}(\mu,\alpha), \;\;\; t \in [0,T], \mu \in {\mathcal{P}}_{_{2}}\left(\mathbb{R}^{d}\right), \end{array}$$

so that

$$\begin{array}{@{}rcl@{}} V_{0} & := & \inf_{\alpha\in{\mathcal{A}}} J(\alpha) \; = \; v_{0}({\mathcal{L}}(\xi_{0})), \end{array}$$
(9)

which may take a priori for the moment the value −. We shall see later that the Assumptions (H1) and (H2) will ensure that V 0 is finite and there exists an optimal control. The dynamic counterpart of (9) is given by

$$\begin{array}{@{}rcl@{}} V_{t}^{\alpha} &:=& \underset{\beta\in{\mathcal{A}}_{t}(\alpha)}{\text{ess}\inf} J_{t}(\rho_{t}^{\alpha},\beta) \; = \; v_{t}(\rho_{t}^{\alpha}), \;\;\; t \in [0,T], \; \alpha \in {\mathcal{A}}, \end{array}$$
(10)

where $${\mathcal {A}}_{t}(\alpha) = \{ \beta \in {\mathcal {A}}: \beta _{s} = \alpha _{s}, s \leq t\}$$, and the second equality in (10) follows from the flow property (8) and the observation that $$\rho _{t}^{\beta } = \rho _{t}^{\alpha }$$ for $$\beta \in {\mathcal {A}}_{t}(\alpha)$$.

By using general results in (El Karoui) for dynamic programming, one can show (under the condition that the random field v(μ) is finite) that the process $$\left \{v_{t}(\rho _{t}^{\alpha }) + {\int _{0}^{t}} \hat {f}_{s}(\rho _{s}^{\alpha },\alpha _{s}) ds,0\leq t\leq T\right \}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$-submartingale, for any $$\alpha \in {\mathcal {A}}$$, and $$\alpha ^{*} \in {\mathcal {A}}$$ is an optimal control for V 0 if and only if $$\{v_{t}(\rho _{t}^{\alpha ^{*}}) + {\int _{0}^{t}} \hat {f}_{s}(\rho _{s}^{\alpha ^{*}},\alpha _{s}^{*}) ds,0\leq t\leq T\}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$-martingale. We shall use a converse result, namely a dynamic programming verification theorem, which takes the following formulation in our context.

### Lemma 2.1

Suppose that one can find an $$\mathbb {F}^{0}$$-adapted random field $$\{w_{t}(\mu), 0\leq t\leq T, \mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\}$$ satisfying the quadratic growth condition

$$\begin{array}{@{}rcl@{}} |w_{t}(\mu)| & \leq & C\|\mu\|_{_{2}}^{2} + I_{t}, \;\;\; \mu \in {\mathcal{P}}_{_{2}}(\mathbb{R}^{d}), \; 0 \leq t \leq T, \; a.s. \end{array}$$
(11)

for some positive constant C, and nonnegative $$\mathbb {F}^{0}$$-adapted process I with $$\mathbb {E}\left [\sup _{0\leq t\leq T}|I_{t}|\right ] < \infty$$, such that

1. (i)

$$w_{T}(\mu) = \hat g(\mu)$$, $$\mu \in {\mathcal {P}}_{_{2}}\left (\mathbb {R}^{d}\right)$$;

2. (ii)

$$\left \{w_{t}(\rho _{t}^{\alpha })+ {\int _{0}^{t}} \hat f_{s}\left (\rho _{s}^{\alpha },\alpha _{s}\right) ds,0\leq t\leq T\right \}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$ local submartingale, for any $$\alpha \in {\mathcal {A}}$$;

3. (iii)

there exists $$\hat \alpha \in {\mathcal {A}}$$ such that $$\left \{w_{t}(\rho _{t}^{\hat \alpha }) + {\int _{0}^{t}} \hat f_{s}\left (\rho _{s}^{\hat \alpha },\hat \alpha _{s}\right) ds,0\leq t\leq T\right \}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$ local martingale.

Then $$\hat \alpha$$ is an optimal control for V 0, i.e. $$V_{0} = J(\hat \alpha)$$, and

$$\begin{array}{@{}rcl@{}}V_{0} &=& w_{0}({\mathcal{L}}(\xi_{0})). \end{array}$$

Moreover, $$\hat \alpha$$ is time consistent in the sense that

$$\begin{array}{@{}rcl@{}} V_{t}^{\hat\alpha} &=& J_{t}\left(\rho_{t}^{\hat\alpha},\hat\alpha \right), \;\;\; \forall 0 \leq t\leq T. \end{array}$$

### Proof

By the local submartingale property in condition (ii), there exists a nondecreasing sequence of $$\mathbb {F}^{0}$$-stopping times (τ n ) n ,τ n T a.s., such that

$$\begin{array}{@{}rcl@{}} \mathbb{E}\left[w_{\tau_{n}}\left(\rho_{\tau_{n}}^{\alpha}\right) + \int_{0}^{\tau_{n}} \hat{f}_{t}\left(\rho_{t}^{\alpha},\alpha_{t}\right) dt\right] & \geq & w_{0}\left(\rho_{0}^{\alpha}\right) \; = \; w_{0}({\mathcal{L}}(\xi_{0})), \;\;\; \forall \alpha \in {\mathcal{A}} \end{array}$$
(12)

From the quadratic form of f in (4), we easily see that for all n,

$$\begin{array}{@{}rcl@{}} \mathbb{E} \left[ \left| \int_{0}^{\tau_{n}} \hat{f}_{t}\left(\rho_{t}^{\alpha},\alpha_{t}\right) dt \right| \right] & \leq & C_{\alpha} \left(1 + \mathbb{E}\left[ \sup_{0\leq t\leq T} \|\rho_{t}^{\alpha}\|_{_{2}}^{2} \right] \right), \end{array}$$

for some positive constant C α depending on α (when A $$= \mathbb {R}^{m}, C_{\alpha }$$ depends on α via $$\mathbb {E}\left [\int _{0}^{T} |\alpha _{t}|^{2} dt\right ] < \infty$$, and when A $$= L(\mathbb {R}^{d};\mathbb {R}^{m}), C_{\alpha }$$ depends on α via its Lipschitz constant). Together with the quadratic growth condition of w, and from (5), one can then apply dominated convergence theorem by sending n to infinity into (12), and get

$$\begin{array}{@{}rcl@{}} w_{0}({\mathcal{L}}(\xi_{0})) & \! \leq\! & \mathbb{E}\left[w_{T}\left(\rho_{T}^{\alpha}\right) + {\int_{0}^{T}} \hat{f}_{t}\left(\rho_{t}^{\alpha},\alpha_{t}\right) dt\right] \, = \, \mathbb{E}\left[\hat{g}\left(\rho_{T}^{\alpha}\right) + {\int_{0}^{T}} \hat{f}_{t}\left(\rho_{t}^{\alpha},\alpha_{t}\right) dt\right] \, = \, J(\alpha) \end{array}$$

where we used the terminal condition (i), and the expression (6) of the cost functional. Since α is arbitrary in $${\mathcal {A}}$$, this shows that $$w_{0}({\mathcal {L}}(\xi _{0})) \leq V_{0}$$. The equality is obtained with the local martingale property for $$\hat \alpha$$ in condition (iii).

From the flow property (8), and since $$\rho _{t}^{\beta } = \rho _{t}^{\hat \alpha }$$ for $$\beta \in {\mathcal {A}}_{t}(\hat \alpha)$$, we notice that the local submartingale and martingale properties in (ii) and (iii) are formulated on the interval [t,T] as:

• $$\left \{w_{s}\left (\rho _{s}^{t,\rho _{t}^{\hat \alpha },\beta }\right)+ {\int _{t}^{s}} \hat {f}_{u}\left (\rho _{u}^{t,\rho _{t}^{\hat \alpha },\beta },\beta _{u}\right) du,t\leq s\leq T\right \}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$ local submartingale, for any $$\beta \in {\mathcal {A}}_{t}(\hat \alpha)$$;

• $$\left \{w_{s}\left (\rho _{s}^{t,\rho _{t}^{\hat \alpha },\hat \alpha }\right) + {\int _{t}^{s}} \hat {f}_{u}\left (\rho _{u}^{t,\rho _{t}^{\hat \alpha },\hat \alpha },\hat \alpha _{u}\right) du, t\leq s\leq T\right \}$$ is a $$(\mathbb {P},\mathbb {F}^{0})$$ local martingale.

By the same arguments as for the initial date, this implies that $$V_{t}^{\hat \alpha } = J_{t}\left (\rho _{t}^{\hat \alpha },\hat \alpha \right) = w_{t}(\rho _{t}^{\hat \alpha })$$, which means that $$\hat \alpha$$ is an optimal control over [t,T], once we start at time t from the initial state $$\rho _{t}^{\hat \alpha }$$, i.e., the time consistency of $$\hat \alpha$$. □

The practical application of Lemma 2.1 consists in finding a random field $$\left \{w_{t}(\mu), \mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d}),0\leq t\leq T \right \}$$, smooth (in a sense to be precised), so that one can apply an Itô’s formula to $$\left \{w_{t}(\rho _{t}^{\alpha })+ {\int _{0}^{t}} \hat {f}_{s}\left (\rho _{s}^{\alpha },\alpha _{s}\right) ds,0\leq t\leq T\right \}$$, and check that the finite variation term is nonnegative for any $$\alpha \in {\mathcal {A}}$$ (the local submartingale condition), and equal to zero for some $$\hat \alpha \in {\mathcal {A}}$$ (the local martingale condition). For this purpose, we need a notion of derivative with respect to a probability measure, and shall rely on the one introduced by P.L. Lions in his course at Collège de France (Lions 2012). We briefly recall the basic definitions and refer to (Cardaliaguet 2012) for the details, see also (Buckdahn et al. 2014; Chassagneux et al. 2015). This notion is based on the lifting of functions u defined on $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$ into functions U defined on $$L^{2}({\mathcal {G}};\mathbb {R}^{d})$$ by setting $$U(X) = u({\mathcal {L}}(X))$$. We say that u is differentiable (resp. $${\mathcal {C}}^{1}$$) on $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$ if the lift U is Fréchet differentiable (resp. Fréchet differentiable with continuous derivatives) on $$L^{2}({\mathcal {G}};\mathbb {R}^{d})$$. In this case, the Fréchet derivative viewed as an element D U(X) of $$L^{2}({\mathcal {G}};\mathbb {R}^{d})$$ by Riesz’s theorem can be represented as

$$\begin{array}{@{}rcl@{}} DU(X) &=& \partial_{\mu} u({\mathcal{L}}(X))(X), \end{array}$$

for some function $$\partial _{\mu } u({\mathcal {L}}(X)) : \mathbb {R}^{d} \rightarrow \mathbb {R}^{d}$$, which is called derivative of u at $$\mu = {\mathcal {L}}(X)$$. Moreover, $$\partial _{\mu } u(\mu) \in L^{2}_{\mu }(\mathbb {R}^{d})$$ for $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d}) = \left \{ {\mathcal {L}}(X): X \in L^{2}({\mathcal {G}};\mathbb {R}^{d})\right \}$$. Following (Chassagneux et al. 2015), we say that u is fully $${\mathcal {C}}^{2}$$ if it is $${\mathcal {C}}^{1}$$, the mapping $$(\mu,x) \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times \mathbb {R}^{d} \mapsto \partial _{\mu } u(\mu)(x)$$ is continuous and

1. (i)

for each fixed $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, the mapping x $$\in \mathbb {R}^{d} \mapsto \partial _{\mu } u(\mu)(x)$$ is differentiable in the standard sense, with a gradient denoted by $$\partial _{x} \partial _{\mu } u(\mu)(x) \in \mathbb {R}^{d\times d}$$, and s.t. the mapping $$(\mu,x) \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times \mathbb {R}^{d} \mapsto \partial _{x} \partial _{\mu } u(\mu)(x)$$ is continuous;

2. (ii)

for each fixed x $$\in \mathbb {R}^{d}$$, the mapping $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d}) \mapsto \partial _{\mu } u(\mu)(x)$$ is differentiable in the above lifted sense. Its derivative, interpreted thus as a mapping $$x' \in \mathbb {R}^{d} \mapsto \partial _{\mu } \left [ \partial _{\mu } u(\mu)(x)\right ](x') \in \mathbb {R}^{d\times d}$$ in $$L^{2}_{\mu }(\mathbb {R}^{d\times d})$$, is denoted by $$x' \in \mathbb {R}^{d} \mapsto \partial _{\mu }^{2} u(\mu)(x,x')$$, and s.t. the mapping $$(\mu,x,x') \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times \mathbb {R}^{d}\times \mathbb {R}^{d} \mapsto \partial _{\mu }^{2} u(\mu)(x,x')$$ is continuous.

We say that u $$\in {{\mathcal {C}}^{2}_{b}}({\mathcal {P}}_{_{2}}(\mathbb {R}^{d}))$$ if it is fully $${\mathcal {C}}^{2}, \partial _{x} \partial _{\mu } u(\mu) \in L_{\mu }^{\infty }(\mathbb {R}^{d\times d}), \partial _{\mu }^{2} u(\mu) \in L_{\mu \otimes \mu }^{\infty }(\mathbb {R}^{d\times d})$$ for any $$\mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, and for any compact set $${\mathcal {K}}$$ of $${\mathcal {P}}_{_{2}}(\mathbb {R}^{d})$$, we have

$$\begin{array}{@{}rcl@{}} \sup_{\mu \in {\mathcal{K}}} \left[ \int_{\mathbb{R}^{d}} \left| \partial_{\mu} u(\mu)(x) \right|^{2}\mu(dx) + \left\| \partial_{x} \partial_{\mu} u(\mu)\|_{_{\infty}} + \right\| \partial_{\mu}^{2} u(\mu)\|_{_{\infty}} \right] & < & \infty. \end{array}$$

We next need an Itô’s formula along a flow of conditional measures proved in (Carmona and Delarue 2014) for processes with common noise. In our context, for the flow of the conditional law $$\rho _{t}^{\alpha }$$, $$0\leq t\leq T, \alpha \in {\mathcal {A}}$$, it is formulated as follows. Let u $$\in {{\mathcal {C}}^{2}_{b}}({\mathcal {P}}_{_{2}}(\mathbb {R}^{d}))$$. Then, for all t [0,T], we have

$$\begin{array}{@{}rcl@{}} u(\rho_{t}^{\alpha}) &=& u({\mathcal{L}}(\xi_{0})) + {\int_{0}^{t}} \rho_{t}^{\alpha} \left(\mathbb{L}_{t}^{\alpha_{t}} u(\rho_{t}^{\alpha}) \right) + \rho_{t}^{\alpha}\otimes\rho_{t}^{\alpha} \left(\mathbb{M}_{t}^{\alpha_{t}} u(\rho_{t}^{\alpha}) \right) dt \\ & & \hspace{3cm} \;\;\; + \; {\int_{0}^{t}} \rho_{t}^{\alpha}\left(\mathbb{D}_{t}^{\alpha_{t}} u(\rho_{t}^{\alpha}) \right) d{W_{t}^{0}}, \end{array}$$
(13)

where for $$(t,\mu,a) \in [0,T]\times {\mathcal {P}}_{_{2}}(\mathbb {R}^{d})\times A, {\mathbb {L}_{t}^{a}} u(\mu), {\mathbb {D}_{t}^{a}} u(\mu)$$ are the $${{\mathcal {F}}_{T}^{0}}$$-measurable random functions in $$L_{\mu }^{2}(\mathbb {R})$$ defined by

$$\begin{array}{@{}rcl@{}} {\mathbb{L}_{t}^{a}} u(\mu)(x) &:=& b_{t}(x,\bar\mu,a).\partial_{\mu} u(\mu)(x) + \frac{1}{2}\text{tr}\left(\partial_{x}\partial_{\mu} u(\mu)(x)\left(\sigma_{t}\sigma_{t}^{\scriptscriptstyle{\intercal}} + {\sigma_{t}^{0}}({\sigma_{t}^{0}})^{\scriptscriptstyle{\intercal}}\right)(x,\bar\mu,a) \right), \\ {\mathbb{D}_{t}^{a}} u(\mu)(x) &:=& \partial_{\mu} u(\mu)(x){^{\scriptscriptstyle{\intercal}}\sigma_{t}^{0}}(x,\bar\mu,a), \end{array}$$

and $${\mathbb {M}_{t}^{a}} u(\mu)$$ is the $${{\mathcal {F}}_{T}^{0}}$$-measurable random function in $$L_{\mu \otimes \mu }^{2}(\mathbb {R})$$ defined by

$$\begin{array}{@{}rcl@{}} {\mathbb{M}_{t}^{a}} u(\mu)(x,x') &:=& \frac{1}{2}\text{tr}\left(\partial^{2}_{\mu} u(\mu)(x,x'){\sigma_{t}^{0}}(x,\bar\mu,a)({\sigma_{t}^{0}})^{\scriptscriptstyle{\intercal}}(x',\bar\mu,a) \right). \end{array}$$

The dynamic programming verification result in Lemma 2.1 and Itô’s formula (13) are valid for a general stochastic McKean-Vlasov equation (beyond the LQ framework), and by combining with an Itô-Kunita type formula for random field processes, similar to the one in (Kunita 1982), one could apply it to $$\left \{w_{t}(\rho _{t}^{\alpha })+ {\int _{0}^{t}} \hat {f}_{s}\left (\rho _{s}^{\alpha },\alpha _{s}\right) ds,0\leq t\leq T\right \}$$ in order to derive a form of stochastic Hamilton-Jacobi-Bellman, i.e., a backward stochastic partial differential equation (BSPDE) for w t (μ), as done in (Peng 1992) for controlled diffusion processes with random coefficients. We postpone this general approach for further study and, in the next sections, return to the important special case of LQCMKV problem[s] for which we show that BSPDE[s] are reduced to backward stochastic Riccati equations (BSRE) as in the classical LQ framework.

## Backward stochastic Riccati equations

We search for an $$\mathbb {F}^{0}$$-adapted random field solution to the LQCMKV problem in the quadratic form

$$\begin{array}{@{}rcl@{}} w_{t}(\mu) &=& \text{Var}(\mu,K_{t}) + v_{2}(\mu,\Lambda_{t}) + v_{1}(\mu,Y_{t}) + \chi_{t}, \end{array}$$
(14)

for some $$\mathbb {F}^{0}$$-adapted processes (K,Λ,Y,χ), with values in $$\mathbb {S}^{d}\times \mathbb {S}^{d}\times \mathbb {R}^{d}\times \mathbb {R}$$, and in the backward SDE form

$$\left\{ \begin{array}{ccll} {dK}_{t} &=& \dot K_{t} dt + {Z_{t}^{K}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, & K_{T} = P \\ d\Lambda_{t} &=& \dot\Lambda_{t} dt + Z_{t}^{\Lambda} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, & \Lambda_{T} = P + \bar P \\ {dY}_{t} &=& \dot Y_{t} dt + {Z_{t}^{Y}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, & Y_{T} = L \\ d\chi_{t} & =& \dot\chi_{t} + Z_{t}^{\chi} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, & \chi_{T} = 0, \end{array}\right.$$
(15)

for some $$\mathbb {F}^{0}$$-adapted processes $$\dot K, \dot \Lambda, Z^{K}, Z^{\Lambda }$$ with values in $$\mathbb {S}^{d}, \dot Y, Z^{Y}$$ with values in $$\mathbb {R}^{d}$$, and $$\dot \chi, Z^{\chi }$$ with values in $$\mathbb {R}$$. Notice that the terminal conditions in (15) ensure by (7) that w in (14) satisfies: $$w_{T}(\mu) = \hat {g}(\mu)$$, and we shall next determine the generators $$\dot K, \dot \Lambda, \dot Y$$, and $$\dot \chi$$ in order to satisfy the local (sub)martingale conditions of Lemma 2.1. Notice that the functions Var,v 2,v 1 are smooth w.r.t. both their arguments, and we have

$$\begin{array}{c} \partial_{\mu} \text{Var}(\mu,k)(x) \,=\, 2 k (x-\bar\mu), \;\; \partial_{x}\partial_{\mu} \text{Var}(\mu,k)(x) \;=\; 2k \, = \, - \partial_{\mu}^{2} \text{Var}(\mu,k)(x,x'),\\ \partial_{k} \text{Var}(\mu,k) \;=\; \text{Var}(\mu) \; := \; \int (x-\bar\mu)(x-\bar\mu)^{\scriptscriptstyle{\intercal}} \mu(dx) \\ \partial_{\mu} v_{2}(\mu,\ell)(x) \;=\; 2 \ell \bar\mu, \;\; \partial_{x}\partial_{\mu} v_{2}(\mu,\ell)(x) \;=\; 0, \;\; \partial_{\mu}^{2} v_{2}(\mu,\ell)(x,x') \;=\; 2\ell, \\ \partial_{\ell} v_{2}(\mu,\ell) \;=\; \bar\mu\bar\mu^{\scriptscriptstyle{\intercal}} \\ \partial_{\mu} v_{1}(\mu,y) \; = \; y, \;\; \partial_{x}\partial_{\mu} v_{1}(\mu,y) \; = \; 0 \; = \; \partial_{\mu}^{2} v_{1}(\mu,y)(x,x'), \;\; \partial_{y} v_{1}(\mu,y) \; = \; \bar\mu. \end{array}$$
(16)

Let us denote, for any $$\alpha \in {\mathcal {A}}$$, by S α the $$\mathbb {F}^{0}$$-adapted process equal to $$S_{t}^{\alpha } = w_{t}(\rho _{t}^{\alpha }) + {\int _{0}^{t}} \hat {f}_{s}(\rho _{s}^{\alpha },\alpha _{s}) ds, 0\leq t\leq T$$, and observe then by Itô’s formula (13) that it is of the form

$$\begin{array}{@{}rcl@{}} {dS}_{t}^{\alpha} &=& D_{t}^{\alpha} dt + \Sigma_{t}^{\alpha} d{W_{t}^{0}}, \end{array}$$

with a drift term $$D_{t}^{\alpha } = {\mathcal {D}}_{t}\left (\rho _{t}^{\alpha },\alpha _{t},K_{t},\Lambda _{t},Y_{t}\right)$$ given by

$$\begin{array}{@{}rcl@{}} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &=& \hat{f}_{t}(\mu,a) + \mu \left({\mathbb{L}_{t}^{a}} \text{Var}(\mu,k) + {\mathbb{L}_{t}^{a}} v_{2}(\mu,\ell) + {\mathbb{L}_{t}^{a}} v_{1}(\mu,y) \right) \\ & & +\, \mu\otimes\mu \left({\mathbb{M}_{t}^{a}} \text{Var}(\mu,k) + {\mathbb{M}_{t}^{a}} v_{2}(\mu,\ell) + {\mathbb{M}_{t}^{a}} v_{1}(\mu,y) \right) \\ & & +\, \text{tr}\left(\partial_{k} \text{Var}(\mu,k)^{\scriptscriptstyle{\intercal}}\dot K_{t}\right) + \text{tr}\left(\partial_{\ell} v_{2}(\mu,\ell)^{\scriptscriptstyle{\intercal}}\dot\Lambda_{t}\right) + \partial_{y} v_{1}(\mu,y)^{\scriptscriptstyle{\intercal}}\dot Y_{t} + \dot\chi_{t} \\ & & + \, \text{tr}\left(\partial_{k} \mu \left({\mathbb{D}_{t}^{a}} \text{Var}(\mu,k) \right)^{\scriptscriptstyle{\intercal}} {Z_{t}^{K}}\right) + \text{tr}\left(\partial_{\ell} \mu \left({\mathbb{D}_{t}^{a}} v_{2}(\mu,\ell) \right)^{\scriptscriptstyle{\intercal}} Z_{t}^{\Lambda}\right)\\ && +\, \partial_{y} \mu \left({\mathbb{D}_{t}^{a}} v_{1}(\mu,\ell) \right)^{\scriptscriptstyle{\intercal}} {Z_{t}^{Y}}, \end{array}$$

for all t $$\in [0,T], \mu \in {\mathcal {P}}_{_{2}}\left (\mathbb {R}^{d}\right), k,\ell \in \mathbb {S}^{d}$$, y $$\in \mathbb {R}^{d}$$, a A. (The second-order derivatives terms w.r.t. k, and y do not appear since the functions v 2,Var and v 1 are linear, respectively, in k, and y, respectively). From the derivatives expression of Var,v 2 and v 1 in (16), we then have

\begin{array}{@{}rcl@{}} {\small{\begin{aligned} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &= \hat f_{t}(\mu,a) + \int b_{t}(x,\bar\mu,a)^{\scriptscriptstyle{\intercal}} \left[2k(x-\bar\mu) + 2\ell \bar\mu + y\right] \mu(dx) \\ & \quad + \, \int \left[\sigma_{t}(x,\bar\mu,a)^{\scriptscriptstyle{\intercal}} k \sigma_{t}(x,\bar\mu,a) + {\sigma_{t}^{0}}(x,\bar\mu,a)^{\scriptscriptstyle{\intercal}} k {\sigma_{t}^{0}}(x,\bar\mu,a) \right] \mu(dx) \\ & \quad +\, \left(\int {\sigma_{t}^{0}}(x,\bar\mu,a)\mu(dx)\right)^{\scriptscriptstyle{\intercal}} (\ell-k) \left({\int\sigma_{t}^{0}}(x,\bar\mu,a) \mu(dx) \right) \\ & \quad + \, \text{Var}(\mu,\dot K_{t}) + v_{2}(\mu,\dot\Lambda_{t}) + v_{1}(\mu,\dot Y_{t}) + \dot\chi_{t} \\ & \quad + \, \int {\sigma_{t}^{0}}(x,\bar\mu,a)^{\scriptscriptstyle{\intercal}} \left[2{Z_{t}^{K}} (x-\bar\mu) + 2Z_{t}^{\Lambda} \bar\mu + {Z_{t}^{Y}}\right] \mu(dx). \end{aligned}}} \end{array}
(17)

We now distinguish between the cases when the control set A is $$\mathbb {R}^{m}$$ (LQCMKV1) or $$L(\mathbb {R}^{d};\mathbb {R}^{m})$$ (LQCMKV2).

### Control set $$\boldsymbol {A} \boldsymbol {=} \boldsymbol {\mathbb {R}^{m}}$$

From the linear form of $$b_{t}, \sigma _{t}, {\sigma _{t}^{0}}$$ in (2), and the quadratic form of $$\hat {f}_{t}$$ in (7), after some straightforward calculations, we have:

$$\begin{array}{@{}rcl@{}} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &=& \text{Var}\left(\mu, \Phi_{t}\left(k,{Z_{t}^{K}}\right) + \dot K_{t}\right) + v_{2}\left(\mu,\Psi_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) + \dot\Lambda_{t}\right) \\ & & \; + \; v_{1}\left(\mu, \Theta_{t}\left(k,\ell,Z_{t}^{\Lambda},y,{Z_{t}^{Y}}\right) +\dot Y_{t}\right) + \Delta_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) + \dot\chi_{t} \\ & & \; + \; a^{\scriptscriptstyle{\intercal}} \Gamma_{t}(k,\ell) a + \left[2 U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right)\bar\mu + R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right)\right]^{\scriptscriptstyle{\intercal}} a \end{array}$$

with

$${\small{\left\{ \begin{array}{ccl} \Phi_{t}\left(k,{Z_{t}^{K}}\right) &=& Q_{t} + B_{t}^{\scriptscriptstyle{\intercal}} k + k B_{t} + D_{t}^{\scriptscriptstyle{\intercal}} k D_{t} + \left({D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} k {D_{t}^{0}} + \left({D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} {Z_{t}^{K}} + {Z_{t}^{K}} {D_{t}^{0}} \\ \Psi_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) &=& Q_{t} + \bar Q_{t} \,+\, \left(D_{t}\,+\,\bar D_{t}\right)^{\scriptscriptstyle{\intercal}} k \left(D_{t}+\bar D_{t}\right) + \left({D_{t}^{0}}+ \bar {D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}}\ell\left({D_{t}^{0}}+\bar {D_{t}^{0}}\right) \\ & & \, + \, (B_{t}\,+\,\bar B_{t})^{\scriptscriptstyle{\intercal}} \ell + \ell(B_{t}+\bar B_{t}) \!+\left({D_{t}^{0}}\! +\! \bar {D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} Z_{t}^{\Lambda} \,+\, Z_{t}^{\Lambda}\left({D_{t}^{0}}+\bar {D_{t}^{0}}\right) \\ \Theta_{t}\left(k,\ell,Z_{t}^{\Lambda},y,{Z_{t}^{Y}}\right) &=& M_{t} \,+\, (B_{t}+\!\bar B_{t})^{\scriptscriptstyle{\intercal}} y \,+\, 2\ell {b_{t}^{0}} + 2(D_{t}+\bar D_{t})^{\scriptscriptstyle{\intercal}} k \gamma_{t} \!+ \!2\left({D_{t}^{0}}+\bar {D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} \ell {\gamma_{t}^{0}} \\ && \; + \; \left({D_{t}^{0}}+\bar {D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} {Z_{t}^{Y}} + 2 Z_{t}^{\Lambda} {\gamma_{t}^{0}} \\ \Delta_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) &=& y^{\scriptscriptstyle{\intercal}} {b_{t}^{0}} + \gamma_{t}^{\scriptscriptstyle{\intercal}} k \gamma_{t} + \left({\gamma_{t}^{0}}\right){^{\scriptscriptstyle{\intercal}}\ell\gamma_{t}^{0}} + \left({Z_{t}^{Y}}\right)^{\scriptscriptstyle{\intercal}} {\gamma_{t}^{0}} \\ \Gamma_{t}(k,\ell) &=& N_{t} + F_{t}^{\scriptscriptstyle{\intercal}} k F_{t} + \left({F_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}}\ell {F_{t}^{0}} \\ U_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) &=& (D_{t}+\bar D_{t})^{\scriptscriptstyle{\intercal}} k F_{t} + \left({D_{t}^{0}}+\bar {D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}}\ell {F_{t}^{0}} + \ell C_{t} + Z_{t}^{\Lambda} {F_{t}^{0}} \\ R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) &=& 2 F_{t}^{\scriptscriptstyle{\intercal}} k \gamma_{t} + 2 \left({F_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} \ell {\gamma_{t}^{0}} + C_{t}^{\scriptscriptstyle{\intercal}} y + \left({F_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} {Z_{t}^{Y}}. \end{array} \right.}}$$
(18)

Then, after square completion under the condition that Γ t (k,) is positive definite in $$\mathbb {S}^{m}$$, we have

\begin{array}{@{}rcl@{}} {\small{{}\begin{aligned} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &= \text{Var}\left(\mu, \Phi_{t}\left(k,{Z_{t}^{K}}\right) + \dot K_{t}\right) \\ &\quad \, + \, v_{2}\left(\mu,\Psi_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) - U_{t}\left(k,\ell,Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(k,\ell)U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right) + \dot\Lambda_{t}\right) \\ &\quad \, + \, v_{1}\!\left(\! \mu,\! \Theta_{t}\! \left(\! k,\ell,Z_{t}^{\Lambda},y,{Z_{t}^{Y}}\! \right) \,-\, U_{t}\! \left(\! k,\ell,Z_{t}^{\Lambda}\right)\! \Gamma_{t}^{-1}(k,\ell)R_{t}\! \! \left(\! k,\ell,y,{Z_{t}^{Y}}\! \right)\! \,+\, \dot Y_{t}\! \right) \\ &\quad \, + \, \Delta_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) - \frac{1}{4} R_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,y,{Z_{t}^{Y}}\right)\Gamma_{t}^{-1}(k,\ell)R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) + \dot\chi_{t} \\ &\quad \, + \, \left(a - \hat{a}_{t}(\bar\mu,k,\ell,y)\right)^{\scriptscriptstyle{\intercal}} \Gamma_{t}(k,\ell) \left(a - \hat{a}_{t}(\bar\mu,k,\ell,y)\right), \end{aligned}}} \end{array}

where

$$\begin{array}{@{}rcl@{}} \hat{a}_{t}(\bar\mu,k,\ell,y) &=& - \Gamma_{t}^{-1}(k,\ell) \left[ U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right) \bar\mu + \frac{1}{2} R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right)\right]. \end{array}$$

Therefore, whenever

$$\begin{array}{@{}rcl@{}} \dot K_{t} + \Phi_{t}\left(K_{t},{Z_{t}^{K}}\right) & =& 0, \\ \dot\Lambda_{t} + \Psi_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right) - U_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right) \Gamma_{t}^{-1}(K_{t}, \Lambda_{t})U_{t}^{\scriptscriptstyle{\intercal}}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right) &=& 0, \\ \dot Y_{t} + \Theta_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda},Y_{s},{Z_{t}^{Y}}\right) - U_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(K_{t}, \Lambda_{t})R_{t}\left(K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) &=& 0, \\ \dot\chi_{t} \,+\, \Delta_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) \,-\, \frac{1}{4} R_{t}^{\scriptscriptstyle{\intercal}}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\! \right)\! \Gamma_{t}^{-1}\left(K_{t},\Lambda_{t},Y_{t}\right) R_{t}\left(K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) &=& 0, \end{array}$$

holds for all 0≤tT, we have

$$\begin{array}{@{}rcl@{}} D_{t}^{\alpha} &=& {\mathcal{D}}_{t}(\rho_{t}^{\alpha},\alpha_{t},K_{t},\Lambda_{t},Y_{t}) \\ &=& \left(\alpha_{t} - \hat{a}_{t}\left(\bar\rho_{t}^{\alpha}, K_{t},\Lambda_{t},Y_{t}\right)\right)^{\scriptscriptstyle{\intercal}} \Gamma_{t}(K_{t},\Lambda_{t}) \left(\alpha_{t} - \hat a_{t}\left(\bar\rho_{t}^{\alpha},K_{t},\Lambda_{t},Y_{t}\right)\right), \end{array}$$
(19)

which implies that $$D_{t}^{\alpha } \geq$$ 0, 0≤tT, for all $$\alpha \in {\mathcal {A}}$$, i.e. $$S_{t}^{\alpha } = w_{t}(\rho _{t}^{\alpha }) + {\int _{0}^{t}} \hat f_{s}(\rho _{s}^{\alpha },\alpha _{s}) ds, 0\leq t\leq T$$ satisfies the $$\left (\mathbb {P},\mathbb {F}^{0}\right)$$-local submartingale property for all $$\alpha \in {\mathcal {A}}$$. We are then led to consider the system of BSDEs:

$${\small{\left\{ \begin{array}{ccl} {dK}_{t} &=& - \Phi_{t}\left(K_{t},{Z_{t}^{K}}\right) dt + {Z_{t}^{K}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; K_{T} = P \\ d\Lambda_{t} &=& - \left[\Psi_{t}(K_{t},\Lambda_{t},Z_{t}^{\Lambda}) - U_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(K_{t}, \Lambda_{t})U_{t}^{\scriptscriptstyle{\intercal}}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\right] dt \\ & & \hspace{3cm} + \; Z_{t}^{\Lambda} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; \Lambda_{T} = P + \bar P \\ {dY}_{t} &=& -\left[\! \Theta_{t}\! \left(\! K_{t},\Lambda_{t},Z_{t}^{\Lambda},Y_{t},{Z_{t}^{Y}}\right) \,-\, U_{t}\! \left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\! \Gamma_{t}^{-1}\! (K_{t},\Lambda_{t}) R_{t}\! \left(K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right)\right] dt \\ & & \hspace{3cm} + \; {Z_{t}^{Y}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \;\; Y_{T} = L, \\ d\chi_{t} &=& -\left[ \Delta_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right)\! -\! \frac{1}{4} R_{t}^{\scriptscriptstyle{\intercal}}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right)\! \Gamma_{t}^{-1}\! (K_{t},\Lambda_{t})R_{t}\! \left(K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) \right] dt \\ & & \hspace{3cm} + \; Z_{t}^{\chi} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \;\; \chi_{T} = 0. \end{array} \right.}}$$
(20)

### Definition 3.1

A solution to the system of BSDE (20) is a quadruple of pair (K,Z K),(Λ,Z Λ),(Y,Z Y),(χ,Z χ) of $$\mathbb {F}^{0}$$-adapted processes, with values, respectively, in $$\mathbb {S}^{d}\times \mathbb {S}^{d}, \mathbb {S}^{d}\times \mathbb {S}^{d}, \mathbb {R}^{d}\times \mathbb {R}^{d}, \mathbb {R}\times \mathbb {R}$$, respectively, such that $${\int _{0}^{T}} |{Z_{t}^{K}}|^{2} + |Z_{t}^{\Lambda }|^{2} + |{Z_{t}^{Y}}|^{2} + |Z_{t}^{\chi }|^{2} dt < \infty$$ a.s., the matrix process Γ(K,Λ) with values in $$\mathbb {S}^{m}$$ is positive definite a.s., and the following relation

$${\small{\left\{ \begin{array}{ccl} K_{t} &=& P + {\int_{t}^{T}} \Phi_{s}(K_{s},{Z_{s}^{K}}) ds - {\int_{t}^{T}} {Z_{s}^{K}} d{W_{s}^{0}}, \\ \Lambda_{t} &=& P + \bar P + {\int_{t}^{T}} \Psi_{s}(K_{s},\Lambda_{s},Z_{s}^{\Lambda}) + U_{s}\left(K_{s},\Lambda_{s},Z_{s}^{\Lambda}\right)\Gamma_{s}^{-1}(K_{s}, \Lambda_{s})U_{s}^{\scriptscriptstyle{\intercal}}\left(K_{s},\Lambda_{s},Z_{s}^{\Lambda}\right) ds \\ & & \hspace{3cm} - \; {\int_{t}^{T}} Z_{s}^{\Lambda} d{W_{s}^{0}}, \\ Y_{t} &=& L \,+\, {\int_{t}^{T}}\! \Theta_{s}\! \left(\!K_{s}, \Lambda_{s},Z_{s}^{\Lambda},Y_{s},{Z_{s}^{Y}}\right) \,-\, U_{s}\! \left(\! K_{s},\Lambda_{s},Z_{s}^{\Lambda}\right)\!\Gamma_{s}^{-1}\!(K_{s},\Lambda_{s}) R_{s}\left(\!K_{s},\! \Lambda_{s},\! Y_{s},\! {Z_{s}^{Y}}\!\right) ds \\ & & \hspace{3cm} - \; {\int_{t}^{T}} {Z_{s}^{Y}} d{W_{s}^{0}}, \\ \chi_{t} &=&\! {\int_{t}^{T}} \Delta_{s}(K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}) - \frac{1}{4} R_{s}^{\scriptscriptstyle{\intercal}}\left(K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}\right)\!\Gamma_{s}^{-1}\!(K_{s},\Lambda_{s})R_{s}\!\left(K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}\right) \!ds \\ & & \hspace{3cm} - \; {\int_{t}^{T}} Z_{s}^{\chi} d{W_{s}^{0}}, \end{array} \right.}}$$

is satisfied for all t [0,T].

The following verification result makes the connection between the system (20) and the LQCMKV1 control problem.

### Proposition 3.1

Assume that (K,Z K),(Λ,Z Λ),(Y,Z Y),(χ,Z χ)is a solution to BSDE (20) such that K,Λ,Γ −1(K,Λ) are essentially bounded, Z Λ lies in L 2(Ω×[0,T]), i.e., $$\mathbb {E}\left [\int _{0}^{T} |Z_{t}^{\Lambda }|^{2} dt\right ] < \infty$$, Y lies in $${\mathcal {S}}^{2}(\Omega \times [0,T])$$, i.e. $$\mathbb {E}\left [|\sup _{0\leq t\leq T}|Y_{t}|^{2}\right ] < \infty$$, and χ lies in $${\mathcal {S}}^{1}(\Omega \times [0,T])$$, i.e. $$\mathbb {E}\left [|\sup _{0\leq t\leq T}|\chi _{t}|\right ] < \infty$$ Then, the control process

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*} &=& \hat{a}_{t}\left(\mathbb{E}\left[X_{t}^{*}|W^{0}\right],K_{t},\Lambda_{t},Y_{t}\right) \\ &=& -\! \Gamma_{t}^{-1}(K_{t},\Lambda_{t})\! \left[\! U_{t}^{\scriptscriptstyle{\intercal}}\! \left(\! K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\! \mathbb{E}\! \left[\! X_{t}^{*}|W^{0}\! \right] \,+\, \frac{1}{2} R_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\!\right)\! \right]\!, \, 0 \!\leq\! t\! \leq\! T, \end{array}$$
(21)

where $$X^{*} = X^{\alpha ^{*}}\phantom {\dot {i}\!}$$ is the state process with the feedback control $$\hat {a}_{t}(.,K_{t},\Lambda _{t},Y_{t})$$, is an optimal control for the LQCMKV1 problem, i.e., V 0=J(α ), and we have

$$\begin{array}{@{}rcl@{}} V_{0} &=& \text{Var}({\mathcal{L}}(\xi_{0}),K_{0}) + v_{2}({\mathcal{L}}(\xi_{0}),\Lambda_{0}) + v_{1}({\mathcal{L}}(\xi_{0}),Y_{0}) + \chi_{0}. \end{array}$$

### Proof

Consider (K,Z K),(Λ,Z Λ),(Y,Z Y),(χ,Z χ) a solution to the BSDE (20), and w as of the quadratic form (14). First, notice that w satisfies the quadratic growth (11) since K,Λ are essentially bounded, and $$(Y,\chi) \in {\mathcal {S}}^{2}(\Omega \times [0,T])\times {\mathcal {S}}^{1}(\Omega \times [0,T])$$. Moreover, we have the terminal condition $$w_{T}(\mu) = \hat g$$. Next, by construction, the process $$D_{t}^{\alpha } = {\mathcal {D}}_{t}(\rho _{t}^{\alpha },\alpha _{t},K_{t},\Lambda _{t},Y_{t}), 0\leq t\leq T$$, is nonnegative, which means that $$S_{t}^{\alpha } = w_{t}(\rho _{t}^{\alpha }) + {\int _{0}^{t}} \hat f_{s}(\rho _{s}^{\alpha },\alpha _{s}) ds, 0\leq t\leq T$$, is a $$(\mathbb {P},\mathbb {F}^{0})$$-local submartingale. Moreover, by choosing the control α in the form (21), we notice that X , the solution to a linear stochastic McKean-Vlasov dynamics, satisfies the square integrability condition: $$\mathbb {E}[\sup _{0\leq t\leq T}|X_{t}^{*}|^{2}] < \infty$$, thus $${\mathbb {E}[\int _{0}^{T}} |\alpha _{t}^{*}|^{2} dt] < \infty$$, since U(K,Λ,Z Λ) inherits from Z Λ the square integrability condition L 2(Ω×[0,T]),Γ −1(K,Λ) is essentially bounded, and so $$\alpha ^{*} \in {\mathcal {A}}$$. Finally, from (19) we see that $$D^{\alpha ^{*}} =$$ 0, which gives the $$(\mathbb {P},\mathbb {F}^{0})$$-local martingale property of $$S^{\alpha ^{*}}$$, and we conclude by the dynamic programming verification Lemma 2.1. □

Let us now show, under assumptions (H1) and (H2), the existence of a solution to the BSDE (20) satisfying the integrability conditions of Proposition 3.1. We point out that this system is decoupled:

1. (i)

One first considers the BSDE for (K,Z K) whose generator $$(k,z) \in \mathbb {S}^{d}\times \mathbb {S}^{d} \mapsto \Phi _{t}(k,z) \in \mathbb {S}^{d}$$ is linear, with essentially bounded coefficients. Since the terminal condition P is also essentially bounded, it is known by standard results for linear BSDEs that there exists a unique solution (K,Z K) with values in $$\mathbb {S}^{d}\times \mathbb {S}^{d}$$, s.t. K is essentially bounded and Z K lies in L 2(Ω×[0,T]). Moreover, since P and Φ t (0,0)=Q t are nonnegative under (H1), we also obtain by standard comparison principle for BSDE that K t is nonnegative, for all 0≤tT.

2. (ii)

Given K, we next consider the BSDE for (Λ,Z Λ) with generator: $$(\ell,z) \in \mathbb {S}^{d}\times \mathbb {S}^{d} \mapsto \Psi _{t}(K_{t},\ell,z) - U_{t}(K_{t},\ell,z)\Gamma _{t}^{-1}(K_{t},\ell)U_{t}^{\scriptscriptstyle {\intercal }}(K_{t},\ell,z) \in \mathbb {S}^{d}$$, and terminal condition $$P+\bar P$$. This is a backward stochastic Riccati equation (BSRE), and it is well-known (see, e.g., (Bismut 1976)) that it is associated with a stochastic standard LQ control problem (without McKean-Vlasov dependence) with controlled linear dynamics:

$$\begin{array}{@{}rcl@{}} d\tilde X_{t} &=& \left[(B_{t} + \bar B_{t})\tilde X_{t} + C_{t}\alpha_{t} \right] dt + \left[\left({D_{t}^{0}}+\bar {D}_{t}^{0}\right)\tilde X_{t} + {F_{t}^{0}} \alpha_{t} \right] d{W_{t}^{0}}, \end{array}$$

$$\begin{array}{@{}rcl@{}} \tilde J^{K}(\alpha) &=& \mathbb{E}\! \left[ {\int_{0}^{T}}\! \left(\! \tilde X_{t}^{\scriptscriptstyle{\intercal}} {Q_{t}^{K}} \tilde X_{t} \,+\, \alpha_{t}^{\scriptscriptstyle{\intercal}} {N_{t}^{K}}\alpha_{t} \,+\, 2\tilde X_{t}^{\scriptscriptstyle{\intercal}} {M_{t}^{K}}\alpha_{t} \right)\! dt \, + \, \tilde X_{T}^{\scriptscriptstyle{\intercal}} (\! P \,+\, \bar P)\tilde X_{T} \right], \end{array}$$

where $${Q_{t}^{K}} = Q_{t}+\bar Q_{t} + (D_{t}+\bar D_{t})^{\scriptscriptstyle {\intercal }} K_{t}(D_{t}+\bar D_{t})$$, $${N_{t}^{K}} = N_{t}+F_{t}^{\scriptscriptstyle {\intercal }} K_{t} F_{t}$$, $${M_{t}^{K}} = (D_{t}+\bar D_{t})^{\scriptscriptstyle {\intercal }} K_{t} F_{t}$$. Under the condition that N K is positive definite, we can rewrite this cost functional after square completion as

$$\begin{array}{@{}rcl@{}} \tilde J^{K}(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \left(\tilde X_{t}\tilde {Q_{t}^{K}} \tilde X_{t} + \tilde\alpha_{t}^{\scriptscriptstyle{\intercal}} {N_{t}^{K}}\tilde\alpha_{t} \right) dt \; + \; \tilde X_{T}^{\scriptscriptstyle{\intercal}}(P +\bar P)\tilde X_{T} \right], \end{array}$$

with $$\tilde {Q_{t}^{K}} = {Q_{t}^{K}}-{M_{t}^{K}} \left ({N_{t}^{K}}\right)^{-1}\left ({M_{t}^{K}}\right)^{\scriptscriptstyle {\intercal }}, \tilde \alpha _{t} = \alpha _{t} + \left ({N_{t}^{K}}\right)^{-1}\left ({M_{t}^{K}}\right)^{\scriptscriptstyle {\intercal }}\tilde X_{t}$$. By noting that $$\tilde {Q_{t}^{K}} \geq Q_{t}+\bar Q_{t}$$, it follows that the symmetric matrices $$\tilde Q^{K}$$ and $$P+\bar P$$ are nonnegative under condition (H1), and assuming furthermore that N K is uniformly positive definite, we obtain from (Tang 2003) the existence and uniqueness of a solution (Λ,Z Λ) to this BSRE, with Λ being nonnegative and essentially bounded, and Z Λ square integrable in L 2(Ω×[0,T]). This implies, in particular, that Γ −1(K,Λ) is well-defined and essentially bounded. Since K is nonnegative under (H1), notice that the uniform positivity condition on N K is satisfied under (H2): this is clear when N is uniformly positive definite (as usually assumed in LQ problem), and holds also true when F is uniformly nondegenerate, and K is uniformly positive definite, which occurs when P or Q is uniformly positive definite from comparison principle for the linear BSDE for K.

3. (iii)

Given (K,Λ,Z Λ), we consider the BSDE for (Y,Z Y) with generator: $$(y,z) \in \mathbb {R}^{d}\times \mathbb {R}^{d} \mapsto G_{t}(y,z) := \Theta _{t}(K_{t},\Lambda _{t},Z_{t}^{\Lambda },y,z) - U_{t}\left (K_{t},\Lambda _{t},Z_{t}^{\Lambda }\right)\Gamma _{t}^{-1}(K_{t},\Lambda _{t})R_{t}^{\scriptscriptstyle {\intercal }}(K_{t},\Lambda _{t},y,z)$$ with values in $$\mathbb {R}^{d}$$, and terminal condition L. This is a linear BSDE and {G t (0,0),0≤tT} lies in L 2(Ω×[0,T]) (recall that b 0, γ, and γ 0 are assumed square integrable). By standard results for BSDEs, we then know that there exists a unique solution (Y,Z Y) s.t. Y lies in $${\mathcal {S}}^{2}(\Omega \times [0,T])$$, and Z lies in L 2(Ω×[0,T]).

4. (iv)

Finally, given (K,Λ,Y,Z Y), we solve the backward stochastic equation for χ, which is explicitly written as

$$\begin{array}{@{}rcl@{}} \chi_{t} &=& \mathbb{E} \left[ {\int_{t}^{T}} \Delta_{s}\left(K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}\right) \right.\\ & & \qquad \left. - \frac{1}{4} R_{s}^{\scriptscriptstyle{\intercal}}\! \! \left(\! K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}\! \right)\! \Gamma_{s}^{-1}\!(K_{s},\Lambda_{s})R_{s} \!\!\left(\! K_{s},\Lambda_{s},Y_{s},{Z_{s}^{Y}}\!\! \right)\! ds \! \left| {{\mathcal{F}}_{t}^{0}} \right. \! \right]\!, \, 0\! \leq\! t\! \leq\! T, \end{array}$$

and χ satisfies the $${\mathcal {S}}^{1}(\Omega \times [0,T])$$ integrability condition.

To sum up, we have proved the following result:

### Theorem 3.1

Under assumptions (H1) and (H2), there exists a unique solution (K,Z K),(Λ,Z Λ),(Y,Z Y),(χ,Z χ) to the BSDE (20) satisfying the integrability condition of Proposition 3.1, and consequently we have an optimal control for the LQCMKV1 problem given by (21).

### Control set $$\boldsymbol {A} \boldsymbol {=} \boldsymbol {L(\mathbb {R}^{d};\mathbb {R}^{m})}$$

From the linear form of $$b_{t}, \sigma _{t}, {\sigma _{t}^{0}}$$ in (2), and the quadratic form of $$\hat {f}_{t}$$ in (7), the random field process in (17) is given, after some calculations by

$$\begin{array}{@{}rcl@{}} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &=& \text{Var}\left(\mu, \Phi_{t}\left(k,{Z_{t}^{K}}\right) + \dot K_{t}\right) + v_{2}\left(\mu,\Psi_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) + \dot\Lambda_{t}\right) \\ & & \; + \; v_{1}\left(\mu, \Theta_{t}\left(k,\ell,Z_{t}^{\Lambda},y,{Z_{t}^{Y}}\right) +\dot Y_{t}\right) + \Delta_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) + \dot\chi_{t} \\ & & \; + \; \text{Var}(a\star\mu,\Gamma_{t}(k,k)) + \overline{a\star\mu}^{\scriptscriptstyle{\intercal}} \Gamma_{t}(k,\ell) \overline{a\star\mu} \\ & & \; + \; 2 \int (x-\bar\mu)^{\scriptscriptstyle{\intercal}} V_{t}\left(k,{Z_{t}^{K}}\right) a(x) \mu(dx) \\ & & \; + \; \left[2 U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right)\bar\mu + R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right)\right]^{\scriptscriptstyle{\intercal}} \overline{a\star\mu}, \end{array}$$

for all t $$\in [0,T], \mu \in {\mathcal {P}}_{_{2}}\left (\mathbb {R}^{d}\right), k,\ell \in \mathbb {S}^{d}$$, y $$\in \mathbb {R}^{d}$$, a $$\in L(\mathbb {R}^{d};\mathbb {R}^{m})$$, where $$a\star \mu \in {\mathcal {P}}_{_{2}}(\mathbb {R}^{m})$$ denotes the image by a of μ,

$$\begin{array}{@{}rcl@{}} \overline{a\star\mu} \; =\! \int \! a(x) \mu(dx), & & \! \text{Var}(a\star\mu,k) \; = \int \! \left(a(x) \,-\, \overline{a\star\mu} \right)^{\scriptscriptstyle{\intercal}} k \left(a(x) \,-\, \overline{a\star\mu} \right) \mu(dx), \end{array}$$

and we keep the same notations as in (18) with the additional term:

$$\begin{array}{@{}rcl@{}} V_{t}\left(k,{Z_{t}^{K}}\right) &=& D_{t}^{\scriptscriptstyle{\intercal}} {kF}_{t} + \left({D_{t}^{0}}\right)^{\scriptscriptstyle{\intercal}} k {F_{t}^{0}} + {kC}_{t} + {Z_{t}^{K}} {F_{t}^{0}}. \end{array}$$
(22)

Then, after square completion under the condition that Γ t (k,) is positive definite in $$\mathbb {S}^{m}$$, we have

$$\begin{array}{@{}rcl@{}} {\mathcal{D}}_{t}(\mu,a,k,\ell,y) &=& \text{Var}\left(\mu, \Phi_{t}\left(k,{Z_{t}^{K}}\right) - V_{t}\left(k,{Z_{t}^{K}}\right)\Gamma_{t}^{-1}(k,k)V_{t}^{\scriptscriptstyle{\intercal}}\left(k,{Z_{t}^{K}}\right) + \dot K_{t}\right) \\ & & \;+\; v_{2}\left(\mu,\Psi_{t}\left(k,\ell,Z_{t}^{\Lambda}\right) - U_{t}\left(k,\ell,Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(k,\ell)U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right) + \dot\Lambda_{t}\right) \\ & & \; + \; v_{1}\! \!\left(\! \mu,\! \Theta_{t}\! \left(\! k,\! \ell,\! Z_{t}^{\Lambda},y,\! {Z_{t}^{Y}}\!\right) \,-\, U_{t}\!\left(\! k,\ell,Z_{t}^{\Lambda}\! \right)\! \Gamma_{t}^{-1}\!(k,\ell)R_{t}\!\left(k,\ell,y,{Z_{t}^{Y}}\right) \,+\, \dot Y_{t} \!\right) \\ & & \; + \; \Delta_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) \,-\, \frac{1}{4} R_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,y,{Z_{t}^{Y}}\right)\Gamma_{t}^{-1}(k,\ell)R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right) \!+ \dot\chi_{t} \\ & & \; + \; \text{Var}\left((a-\hat{\mathbf{a}}_{t})(.,\bar\mu,k,\ell,y)\star\mu,\Gamma_{t}(k,k)\right) \\ & & \; + \; \overline{(a-\hat{\mathbf{a}}_{t})(.,\bar\mu,k,\ell,y)\star\mu}^{\scriptscriptstyle{\intercal}} \Gamma_{t}(k,\ell) \overline{(a-\hat{\mathbf{a}}_{t})(.,\bar\mu,k,\ell,y)\star\mu} \end{array}$$

where $$\hat {\mathbf {a}}_{t}(.,\bar \mu,k,\ell,y) : \mathbb {R}^{d} \rightarrow \mathbb {R}^{m}$$ is defined by

$$\begin{array}{@{}rcl@{}} \hat{\mathbf{a}}_{t}(x,\bar\mu,k,\ell,y) &=& - \Gamma_{t}^{-1}(k,k)V_{t}\left(k,{Z_{t}^{K}}\right)^{\scriptscriptstyle{\intercal}} (x-\bar\mu) \\ & & \;\;\;\,\, - \, \Gamma_{t}^{-1}(k,\ell)\! \left[\! U_{t}^{\scriptscriptstyle{\intercal}}\left(k,\ell,Z_{t}^{\Lambda}\right) \bar\mu \,+\, \frac{1}{2} R_{t}\left(k,\ell,y,{Z_{t}^{Y}}\right)\! \right], \;\; \!x \in \mathbb{R}^{d}. \end{array}$$

We then consider the system of BSDEs:

$${\small{\left\{ \begin{array}{ccl} {dK}_{t} &=& - \left[\Phi_{t}\left(K_{t},{Z_{t}^{K}}\right) - V_{t}\left(K_{t},{Z_{t}^{K}}\right)\Gamma_{t}^{-1}(K_{t},K_{t})V_{t}^{\scriptscriptstyle{\intercal}}\left(K_{t},{Z_{t}^{K}}\right) \right]dt \\ & &\hspace{3cm} + \; {Z_{t}^{K}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; K_{T} = P \\ d\Lambda_{t} &=& - \left[\Psi_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right) - U_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(K_{t},\Lambda_{t})U_{t}^{\scriptscriptstyle{\intercal}}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\right] dt \\ & & \hspace{3cm} + \; Z_{t}^{\Lambda} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; \Lambda_{T} = P + \bar P \\ {dY}_{t} &=& -\left[\! \Theta_{t}\left(K_{t},\Lambda_{t},Z_{t}^{\Lambda},Y_{t},{Z_{t}^{Y}}\right) \,-\, U_{t}\! \left(\! K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\Gamma_{t}^{-1}(K_{t},\Lambda_{t}) R_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right)\! \right]\! dt \\ & & \hspace{3cm} + \; {Z_{t}^{Y}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \;\; Y_{T} = L, \\ d\chi_{t} &=& -\left[\! \Delta_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) \,-\, \frac{1}{4} R_{t}^{\scriptscriptstyle{\intercal}}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\right) \Gamma_{t}^{-1}(K_{t},\Lambda_{t})R_{t}\! \left(\! K_{t},\Lambda_{t},Y_{t},{Z_{t}^{Y}}\!\right)\! \right] dt \\ & & \hspace{3cm} + \; Z_{t}^{\chi} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \;\; \chi_{T} = 0, \end{array} \right.}}$$
(23)

and by the same arguments as in Proposition 3.1, we have the following verification result making the connection between the system (23) and the LQCMKV2 control problem.

### Proposition 3.2

Assume that (K,Z K),(Λ,Z Λ),(Y,Z Y),(χ,Z χ)is a solution to the BSDE (23) such that K,Λ,Γ −1(K,Λ)are essentially bounded, Y lies in $${\mathcal {S}}^{2}(\Omega \times [0,T])$$, and χ lies in $${\mathcal {S}}^{1}(\Omega \times [0,T])$$. Then, the control process α with values in $$L(\mathbb {R}^{d};\mathbb {R}^{m})$$ and defined by

\begin{array}{@{}rcl@{}} {\small{\begin{aligned} \alpha_{t}^{*}(x) &= \hat{\mathbf{a}}_{t}\left(x,\mathbb{E}\left[X_{t}^{*}|W^{0}\right],K_{t},\Lambda_{t},Y_{t}\right) \\ &= - \Gamma_{t}^{-1}(K_{t},K_{t})V_{t}\left(K_{t},{Z_{t}^{K}}\right)^{\scriptscriptstyle{\intercal}} \left(x- \mathbb{E}\left[X_{t}^{*}|W^{0}\right]\right) \\ & \quad - \Gamma_{t}^{-1}\!(K_{t},\Lambda_{t})\! \left[\! U_{t}^{\scriptscriptstyle{\intercal}}\! \left(\! K_{t},\Lambda_{t},Z_{t}^{\Lambda}\right)\! \mathbb{E}\! \left[\! X_{t}^{*}|W^{0}\right]\! \,+\, \frac{1}{2} R_{t}\! \left(\!\! K_{t},\! \Lambda_{t},\! Y_{t},\! {Z_{t}^{Y}}\!\right)\!\right]\!, \! x \! \in \! \mathbb{R}^{d},\! 0\! \leq \! t\! \leq T, \end{aligned}}} \end{array}

where $$X^{*} = X^{\alpha ^{*}}\phantom {\dot {i}\!}$$ is the state process with the feedback control $$\hat {\mathbf {a}}_{t}(.,.,K_{t},\Lambda _{t},Y_{t})$$, is an optimal control for the LQCMKV2 problem, i.e., V 0=J(α ), and we have

$$\begin{array}{@{}rcl@{}} V_{0} &=& \text{Var}({\mathcal{L}}(\xi_{0}),K_{0}) + v_{2}({\mathcal{L}}(\xi_{0}),\Lambda_{0}) + v_{1}({\mathcal{L}}(\xi_{0}),Y_{0}) + \chi_{0}. \end{array}$$

Let us now discuss the existence of a solution to the BSDE (23) satisfying the integrability conditions of Proposition 3.2. As for (20), this system is decoupled. The difference w.r.t to the LQCMKV1 problem is in the BSDE for (K,Z K), where the generator $$(k,z) \in \mathbb {S}^{d}\times \mathbb {S}^{d} \mapsto \Phi _{t}(k,z)-V_{t}(k,z)\Gamma _{t}^{-1}(k,k)V_{t}^{\scriptscriptstyle {\intercal }}(k,z) \in \mathbb {S}^{d}$$ is now of the Riccati type. In general, it is not in the class of BSREs related to LQ control problem, but existence can be obtained in some particular cases:

1. (1)

The coefficients B, C, D, F, D 0,F 0, Q, P, N are deterministic. In this case, the BSRE for K is reduced to a matrix Riccati ordinary differential equation:

$$\begin{array}{@{}rcl@{}} - \frac{{dK}_{t}}{dt} &=& \Phi_{t}(K_{t},0) - V_{t}(K_{t},0) \Gamma_{t}^{-1}(K_{t},K_{t})V_{t}^{\scriptscriptstyle{\intercal}}(K_{t},0), \;\;\; 0 \leq t \leq T, \; K_{T} \; = \; P. \end{array}$$

This problem is associated to the LQ problem with controlled linear dynamics

$$\begin{array}{@{}rcl@{}} d\tilde X_{t} &=& (B_{t} \tilde X_{t} + C_{t} \tilde\alpha_{t}) dt + (D_{t}\tilde X_{t} + F_{t}\tilde\alpha_{t}) {dW}_{t} + \left({D_{t}^{0}}\tilde X_{t} + {F_{t}^{0}}\tilde\alpha_{t}\right) d{W_{t}^{0}}, \end{array}$$

where the control process $$\tilde \alpha$$ is an $$\mathbb {F}$$-adapted process with values in $$\mathbb {R}^{m}$$, and the cost functional to be minimized over $$\tilde \alpha$$ is

$$\begin{array}{@{}rcl@{}} \tilde J(\tilde\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \left(\tilde X_{t}^{\scriptscriptstyle{\intercal}} Q_{t} \tilde X_{t} + \tilde\alpha_{t}^{\scriptscriptstyle{\intercal}} N_{t} \tilde\alpha_{t}\right) dt + \tilde X_{T}^{\scriptscriptstyle{\intercal}} P \tilde X_{T} \right]. \end{array}$$

It was solved in (Wonham 1968) under assumption (H1) and the condition (H2)(i) that N is uniformly positive definite, and this gives the existence and uniqueness of K $$\in C^{1}\left ([0,T];\mathbb {S}^{d}\right)$$, which is nonnegative.

2. (2)

DF ≡ 0. In this case, the BSRDE for (K,Z K) is associated to the LQ problem with controlled linear dynamics

$$\begin{array}{@{}rcl@{}} d\tilde X_{t} &=& (B_{t} \tilde X_{t} + C_{t} \tilde\alpha_{t}) dt + \left({D_{t}^{0}}\tilde X_{t} + {F_{t}^{0}}\tilde\alpha_{t}\right) d{W_{t}^{0}}, \end{array}$$

where the control process $$\tilde \alpha$$ is an $$\mathbb {F}^{0}$$-adapted process with values in $$\mathbb {R}^{m}$$, and the cost functional to be minimized over $$\tilde \alpha$$ is

$$\begin{array}{@{}rcl@{}} \tilde J(\tilde\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \left(\tilde X_{t}^{\scriptscriptstyle{\intercal}} Q_{t} \tilde X_{t} + \tilde\alpha_{t}^{\scriptscriptstyle{\intercal}} N_{t} \tilde\alpha_{t}\right) dt + \tilde X_{T}^{\scriptscriptstyle{\intercal}} P \tilde X_{T} \right]. \end{array}$$

It is then known from (Tang 2003) that under assumptions (H1) and (H2)(i), there exists a unique pair (K,Z K) solution to the BSRDE, with K nonnegative, and essentially bounded.

3. (3)

N ≡ 0, P is uniformly positive, m =d, and F is invertible with F −1 bounded. In this case, the BSDE for K is reduced to the linear BSDE:

$$\begin{array}{@{}rcl@{}} {dK}_{t} &=& - \left[ \Phi_{t}\left(K_{t},{Z_{t}^{K}}\right) - \left(C_{t} F_{t}^{-1} D_{t}\right)^{\scriptscriptstyle{\intercal}} K_{t} + K_{t}\left(C_{t} F_{t}^{-1} D_{t}\right) \right.\\ & & \left.\;\; - \; D_{t}^{\scriptscriptstyle{\intercal}} K_{t} D_{t} \,-\, K_{t}C_{t}\left(F_{t}^{\scriptscriptstyle{\intercal}} K_{t} F_{t}\right)^{-1} C_{t}^{\scriptscriptstyle{\intercal}} K_{t} \right]\! dt \,+\, {Z_{t}^{K}} d{W_{t}^{0}}, \;\; 0 \leq t\leq T, \; K_{T} = P, \end{array}$$

for which it is known that there exists a unique solution (K,Z K), with K positive, and essentially bounded.

It is an open question whether existence of a solution for K to the BSRE (23) holds in the general case. Anyway, once a solution K exists, and is given, the BSDEs for the pairs (Λ,Z Λ),(Y,Z Y),(χ,Z χ) are the same as in (20), and then their existence and uniqueness are obtained under the same conditions.

## Applications

### Trading with price impact and benchmark tracking

We consider an agent trading in a financial market with an inventory X t , i.e., a number of shares held at time t in a risky stock, governed by

$$\begin{array}{@{}rcl@{}} {dX}_{t} &=& \alpha_{t} dt, \end{array}$$

where the control α, a real-valued $$\mathbb {F}^{0}$$-progressively measurable process in L 2(Ω×[0,T]), represents the trading rate. Given a real-valued $$\mathbb {F}^{0}$$-adapted stock price process (S t )0≤tT in L 2(Ω×[0,T]), a real-valued $$\mathbb {F}^{0}$$-adapted target process (I t )0≤tT in L 2(Ω×[0,T]), and a terminal benchmark H as a square integrable $${{\mathcal {F}}_{T}^{0}}$$-measurable random variable, the objective of the agent is to minimize over control processes α a cost functional of the form:

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \left(\alpha_{t} \left(S_{t} + \eta \alpha_{t}\right) + q(X_{t} - I_{t})^{2} \right) dt + \lambda (X_{T} - H)^{2} \right], \end{array}$$
(24)

where η> 0, q ≥ 0, and λ≥ 0 are constants.

Such formulation is connected with optimal trading and hedging problems in presence of liquidity frictions like price impact, and widely studied in the recent years: when S ≡= 0, the cost functional in (24) arises in option hedging in presence of transient price impact, see, e.g., (Almgren and Li 2016; Bank et al. 2015; Rogers and Singh 2010) and is also related to the problem of optimal VWAP execution (see (Cartea and Jaimungal 2015; Frei and Westray 2015), or benchmark tracking, see (Cai et al. 2015). When q = 0, the minimization of the cost functional in (24) corresponds to the optimal execution problem arising in limit order book (LOB), as originally formulated in (Almgren and Chriss 2000) in a particular Bachelier model for S, and has been extended (with general shape functions in LOB) in the literature, but mostly by assuming the martingale property of the price process, see, e.g., (Alfonsi et al. 2010; Predoiu et al. 2011). By rewriting the cost functional after square completion as

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} \left(\eta {\tilde\alpha_{t}^{2}} + q(X_{t} - I_{t})^{2} \right) dt + \lambda (X_{T} - H)^{2} \right] \; - \; \mathbb{E}\left[ {\int_{0}^{T}} \frac{{S_{t}^{2}}}{4\eta} dt \right], \end{array}$$

with $$\tilde \alpha _{t} = \alpha _{t}$$ + $$\frac {S_{t}}{2\eta }$$, we see that this problem fits into the LQCMKV1 framework (with $${b_{t}^{0}} = - \frac {S_{t}}{2\eta }$$, without McKean-Vlasov dependence but with random coefficients), and Assumptions (H1), (H2) are satisfied. From Theorem 3.1, the optimal control is then given by

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*} &=& - \frac{1}{\eta} \left[\Lambda_{t} X_{t}^{*} + \frac{Y_{t}}{2} \right] - \frac{S_{t}}{2\eta}, \;\;\; 0 \leq t\leq T, \end{array}$$
(25)

where Λ is solution to the (ordinary differential) Riccati equation

$$\begin{array}{@{}rcl@{}} d\Lambda_{t} &=& -\left(q - \frac{{\Lambda_{t}^{2}}}{\eta}\right) dt, \;\;\; 0 \leq t\leq T, \; \Lambda_{T} \; = \; \lambda, \end{array}$$
(26)

and Y is solution to the linear BSDE

$$\begin{array}{@{}rcl@{}} {dY}_{t} &=& \!\left[\! 2{qI}_{t} \,+\, \frac{\Lambda_{t}}{\eta} S_{t} \,+\, \frac{\Lambda_{t}}{\eta} Y_{t} \right]\! dt \,+\, {Z_{t}^{Y}} d{W_{t}^{0}}, \;\; 0 \leq t\leq T, \; Y_{T} \; = \;\! - 2\lambda H. \end{array}$$
(27)

The solution to the Riccati equation is

$$\begin{array}{@{}rcl@{}} \frac{\Lambda_{t}}{\eta} &=& \sqrt{q/\eta} \frac{\sqrt{q/\eta}\sinh(\sqrt{q/\eta}(T-t)) + \lambda/\eta\cosh(\sqrt{q/\eta}(T-t))}{\lambda/\eta \sinh(\sqrt{q/\eta}(T-t)) + \sqrt{q/\eta}\cosh(\sqrt{q/\eta}(T-t))}, \;\;\; 0 \leq t \leq T, \end{array}$$

while the solution to the linear BSDE is given by

$$\begin{array}{@{}rcl@{}} Y_{t} &=&- 2 \mathbb{E} \left[ e^{-{\int_{t}^{T}} \frac{\Lambda_{s}}{\eta} ds} \lambda H + {\int_{t}^{T}} e^{-{\int_{t}^{s}} \frac{\Lambda_{u}}{\eta} du} \left(q I_{s} + \frac{\Lambda_{s}}{\eta} S_{s} \right) ds \left| {{\mathcal{F}}_{t}^{0}} \right.\right], \;\; 0 \leq t \leq T. \end{array}$$

By integrating the function Λ/η, we have

$$\begin{array}{@{}rcl@{}} e^{-{\int_{t}^{s}} \frac{\Lambda_{u}}{\eta} du} &=& \frac{\Lambda_{t}/\eta}{\sqrt{q/\eta}} \frac{\sqrt{q/\eta} \cosh(\sqrt{q/\eta}(T-s)) + \lambda/\eta\sinh(\sqrt{q/\eta}(T-s))} { \sqrt{q/\eta} \sinh(\sqrt{q/\eta}(T-t)) + \lambda/\eta\cosh(\sqrt{q/\eta}(T-t))} \\ &=& \frac{\Lambda_{t}}{\Lambda_{s}} \frac{\sqrt{q/\eta} \sinh(\sqrt{q/\eta}(T-s)) + \lambda/\eta\cosh(\sqrt{q/\eta}(T-s))} { \sqrt{q/\eta} \sinh(\sqrt{q/\eta}(T-t)) + \lambda/\eta\cosh(\sqrt{q/\eta}(T-t))}, \;\;\; t \leq s \leq T, \end{array}$$

and plugging into the expectation form of Y, the optimal control in (25) is then expressed as

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*} &=& - \frac{\Lambda_{t}}{\eta} \left(X_{t}^{*} - \hat{I}_{t}^{H}\right) + \frac{1}{2\eta} \left(\mathbb{E}\left[\int_{t}^{T} \frac{\Lambda_{t}}{\eta}\frac{\omega(t,T)}{\omega(s,T)} S_{s} ds | {{\mathcal{F}}_{t}^{0}}\right] - S_{t} \right) \\ & =: & \alpha_{t}^{*,IH} + \alpha_{t}^{*,S}, \;\;\; 0 \leq t \leq T, \end{array}$$
(28)

where

$$\begin{array}{@{}rcl@{}} \hat{I}_{t}^{H} &=& \mathbb{E} \left[ \omega(t,T) H + (1- \omega(t,T)) {\int_{t}^{T}} I_{s} {\mathcal{K}}(t,s) ds \left| {{\mathcal{F}}_{t}^{0}} \right. \right] \end{array}$$

with a weight valued in [0,1]

$$\begin{array}{@{}rcl@{}} \omega(t,T) &=& \frac{\lambda/\eta}{ \sqrt{q/\eta} \sinh(\sqrt{q/\eta}(T-t)) + \lambda/\eta\cosh(\sqrt{q/\eta}(T-t))}, \end{array}$$

and a kernel

$$\begin{array}{@{}rcl@{}} {\mathcal{K}}(t,s) &=& \sqrt{q/\eta} \frac{\sqrt{q/\eta} \cosh(\sqrt{q/\eta}(T \,-\, t)) \,+\, \lambda/\eta\sinh(\sqrt{q/\eta}(T \,-\, t))}{\sqrt{q/\eta} \sinh(\sqrt{q/\eta}(T \,-\, t)) \,+\, \lambda/\eta(\cosh(\sqrt{q/\eta}(T \,-\, t)) \,-\, 1)}, \;\;\; 0\leq t \leq s \leq T. \end{array}$$

The optimal trading rule in (28) is decomposed in two parts:

1. (i)

The first term α ,IH prescribes the agent to trade optimally towards a weighted average $$\hat {I}_{t}^{H}$$, rather than the current target position I. Indeed, $$\hat {I}^{H}$$ is a convex combination of the expected future of the terminal random target H, and of a weighted average of the running target I (notice that $${\mathcal {K}}(t,.)$$ is a nonnegative kernel integrating to one over [t,T]). The rate towards this target is at a speed proportional to its distance w.r.t the current investor’s position, and the coefficient of proportionality is determined by the costs parameters η, q, λ and the time to maturity Tt. We retrieve the interpretation and results obtained in (Bank et al. 2015) in the limiting cases where λ= 0 (no constraint on the terminal position), and λ= (constraint on the terminal position X T =H). In the case where q = 0, we have Λ t /η=λ/(η+λ(Tt)), $$\hat {I_{t}^{H}} = \mathbb {E}[H|{{\mathcal {F}}_{t}^{0}}]$$, and we retrieve, in particular, the expression $$\alpha ^{*,IH} = -X_{t}^{*}/(T-t)$$, of optimal trading rate when H = 0, and λ corresponding to the optimal execution problem with terminal liquidation X T = 0.

2. (ii)

The second term α ,S related to the stock price, is an incentive to buy or sell depending on whether the weighted average of expected future value of the stock is larger or smaller than its current value. In particular, when the price process is a martingale, then

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*,S} & = & - \frac{S_{t}}{2\eta} \frac{\sqrt{q/\eta}}{\sqrt{q/\eta} \cosh(\sqrt{q/\eta}(T-t)) + \lambda/\eta\sinh(\sqrt{q/\eta}(T-t))} \end{array}$$

which is nonpositive for nonnegative price S t , hence meaning that due to the price impact, one must sell. Moreover, in the limiting case where λ, i.e., the terminal inventory X T is constrained to achieve the target H, then α ,S is zero: we retrieve the result that the optimal trading rate does not depend on the price process when it is a martingale, see (Alfonsi et al. 2010; Predoiu et al. 2011.

On the other hand, by applying Itô’s formula to (25), and using (26)-(27), we have

$$\begin{array}{@{}rcl@{}} d\left(\alpha_{t}^{*} + \frac{S_{t}}{2\eta} \right) &=& \frac{q}{\eta}(X_{t}^{*}-I_{t}) ds - \frac{1}{2\eta} {Z_{t}^{Y}} d{W_{s}^{0}}, \end{array}$$

which implies the notable property:

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*} + \frac{S_{t}}{2\eta} - \frac{q}{\eta} {\int_{0}^{t}} (X_{s}^{*}-I_{s}) ds, \;\;\; 0 \leq t\leq T, & & \text{is a martingale.} \end{array}$$

### Conditional mean-variance portfolio selection in incomplete market

We consider an agent who can invest in a financial market model with one bond of price process S 0 and one risky asset of price process S governed by

$$\begin{array}{@{}rcl@{}} d{S_{t}^{0}} &=& {S_{t}^{0}} r(I_{t}) dt \\ {dS}_{t} &=& S_{t} ((b+r)(I_{t}) dt + \sigma(I_{t}) {dW}_{t}), \end{array}$$

where I is a factor process with dynamics governed by a Brownian motion W 0, assumed to be non correlated with the Brownian motion W driving the asset price process S, and r the interest rate, b the excess rate of return, and σ the volatility are measurable bounded functions of I, with σ(I t )≥ε for some ε> 0. We shall assume that the natural filtration generated by the observable factor process I is equal to the filtration $$\mathbb {F}^{0}$$ generated by W 0. Notice that the market is incomplete as the agent cannot trade in the factor process. The investment strategy of the agent is modeled by a random field $$\mathbb {F}^{0}$$-progressively measurable process $$\alpha = \{\alpha _{t}(x), 0 \leq t\leq T, x \in \mathbb {R} \}$$ (or equivalently as a $$\mathbb {F}^{0}$$-progressively measurable process with values in $$L(\mathbb {R};\mathbb {R})$$) where α t (x) with values in $$\mathbb {R}$$, is Lipschitz in x, and represents the amount invested in the stock at time t, when the current wealth is X t =x, and based on the past observations $${{\mathcal {F}}_{T}^{0}}$$ of the factor process. The evolution of the controlled wealth process is then given by

$$\begin{array}{@{}rcl@{}} {}{dX}_{t} &=& r(I_{t}) X_{t} dt \,+\, \alpha_{t}\! (X_{t})\! \left(b(I_{t}) dt \,+\, \sigma(I_{t}) {dW}_{t} \right), \; 0 \leq t \leq T, \; X_{0} = x_{0} \in \mathbb{R}. \end{array}$$
(29)

The objective of the agent is to minimize over investment strategies a criterion of the form:

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ \frac{\lambda}{2} \text{Var}\left(X_{T}|W^{0}\right) - \mathbb{E}\left[X_{T}|W^{0}\right] \right], \end{array}$$

where λ is a positive $${{\mathcal {F}}_{T}^{0}}$$-measurable random variable. In the absence of random factors in the dynamics of the price process, hence in a complete market model, and when λ is constant, the above criterion reduces to the classical mean-variance portfolio selection, as studied e.g. in (Li and Zhou 2000). Here, in presence of the random factor, we consider the expectation of a conditional mean-variance criterion, and also allow the risk-aversion parameter λ to depend reasonably on the random factor environment. By rewriting the cost functional as

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ \frac{\lambda}{2} {X_{T}^{2}} - \frac{\lambda}{2} \left(\mathbb{E}[X_{T}|W^{0}] \right)^{2} - X_{T} \right], \end{array}$$

we then see that this conditional mean-variance portfolio selection problem fits into the LQCMKV2 problem, and more specifically into the case (3) of the discussion following Proposition 3.2. The optimal control is then given from (24) by

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*}(x) &=& \,-\, \frac{b(I_{t})}{\sigma^{2}(I_{t})}\left(x \,-\, \mathbb{E}\left[X_{t}^{*}|W^{0}\right]\right) \,-\, \frac{b(I_{t})}{\sigma^{2}(I_{t}) K_{t}} \left[\! \Lambda_{t} \mathbb{E}\left[X_{t}^{*}|W^{0}\right] + \frac{1}{2} Y_{t} \right], \end{array}$$
(30)

where X is the optimal wealth process in (29) controlled by α , K is the solution to the linear BSDE

$$\begin{array}{@{}rcl@{}} {dK}_{t} &=& \left[ \frac{b^{2}(I_{t})}{\sigma^{2}(I_{t})} - 2 r(I_{t}) \right]K_{t} dt + {Z_{t}^{K}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; K_{T} = \frac{\lambda}{2}, \end{array}$$

Λ is solution to the linear BSDE

$$\begin{array}{@{}rcl@{}} d\Lambda_{t} &=& \left[ \frac{b^{2}(I_{t})}{\sigma^{2}(I_{t})K_{t}} {\Lambda_{t}^{2}} - 2 r(I_{t}) \Lambda_{t} \right] dt + Z_{t}^{\Lambda} d{W_{t}^{0}}, \;\;\; 0 \leq t\leq T, \; \Lambda_{T} = 0, \end{array}$$

and Y the solution to the linear BSDE

$$\begin{array}{@{}rcl@{}} {dY}_{t} &=& \left[ \frac{b^{2}(I_{t})\Lambda_{t}}{\sigma^{2}(I_{t})K_{t}} - r(I_{t}) \right] Y_{t} dt + {Z_{t}^{Y}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; Y_{T} = -1. \end{array}$$

The solutions to these linear BSDEs are explicitly given by

$$\begin{array}{@{}rcl@{}} K_{t} &=& \mathbb{E} \left[ \frac{\lambda}{2} \exp\left({\int_{t}^{T}} 2 r(I_{s}) - \frac{b^{2}(I_{s})}{\sigma^{2}(I_{s})} ds\right) \left| {{\mathcal{F}}_{t}^{0}} \right. \right], \end{array}$$
(31)

Λ= 0, and

$$\begin{array}{@{}rcl@{}} Y_{t} &=& - \mathbb{E} \left[ \exp\left({\int_{t}^{T}} r(I_{s}) ds \right) \left| {{\mathcal{F}}_{t}^{0}} \right. \right], \;\;\; 0 \leq t \leq T. \end{array}$$
(32)

From (29) and (30), the conditional mean of the optimal wealth X with portfolio strategy α is governed by

$$\begin{array}{@{}rcl@{}} d \mathbb{E}\left[X_{t}^{*}|W^{0}\right] &=& \left[ r(I_{t}) \mathbb{E}\left[X_{t}^{*}|W^{0}\right] - \frac{b^{2}(I_{t})}{2\sigma^{2}(I_{t})} \frac{Y_{t}}{K_{t}} \right] dt, \end{array}$$

hence explicitly given by

$$\begin{array}{@{}rcl@{}} \mathbb{E}\left[X_{t}^{*}|W^{0}\right] &=& x_{0} e^{{\int_{0}^{t}} r(I_{s}) ds} - {\int_{0}^{t}} \frac{b^{2}(I_{s})}{2\sigma^{2}(I_{s})} \frac{Y_{s}}{K_{s}} e^{{\int_{s}^{t}} r(I_{u}) du} ds, \;\;\; 0 \leq t \leq T. \end{array}$$

Plugging into (30), this gives the explicit form of the optimal control for the conditional mean-variance portfolio selection problem:

\begin{array}{@{}rcl@{}} {\small{{}\begin{aligned} \alpha_{t}^{*}(X_{t}^{*}) &=& \!\!\!\frac{b(I_{t})}{\sigma^{2}(I_{t})} \left[ x_{0} e^{{\int_{0}^{t}} r(I_{s}) ds} - X_{t}^{*} + \frac{1}{2}\left({\int_{0}^{t}} \frac{b^{2}(I_{s})}{\sigma^{2}(I_{s})} \frac{|Y_{s}|}{K_{s}} e^{{\int_{s}^{t}} r(I_{u}) du} ds + \frac{|Y_{t}|}{K_{t}} \right) \right], \end{aligned}}} \end{array}
(33)

for all 0≤tT, with K and Y in (31)-(32). When b, σ, and r do not depend on I, we retrieve the expression of the optimal control obtained in (Li and Zhou 2000), and the formula (33) is an extension to the case of an incomplete market with a factor I independent of the stock price.

### Systemic risk model

We consider a model of inter-bank borrowing and lending where the log-monetary reserves X i, i =1,…,n, of n banks are driven by

$$\begin{array}{@{}rcl@{}} d{X_{t}^{i}} &=& \! \frac{\kappa(I_{t})}{n} \sum_{j=1}^{n} \! \left(\! {X_{t}^{j}} \,-\, {X_{t}^{i}}\! \right)\! dt \,+\, {\alpha_{t}^{i}} dt \,+\, \sigma(I_{t}) \left(\! \sqrt{1-\rho^{2}(I_{t})}\, d{W_{t}^{i}} \,+\, \rho(I_{t}) d{W_{t}^{0}}\! \right)\!, \; i=1,\ldots,n, \end{array}$$

where I t is a factor process driven by a Brownian motion W 0, which is the common noise for all the banks, W i, i =1,…,N, are independent Brownian motions, independent of W 0, called idiosyncratic noises, ρ(I t )[−1,1] is the correlation between the idiosyncratic noise and the common noise, κ(I t )≥ 0 is the rate of mean-reversion in the interaction from borrowing and lending between the banks, σ(I t )> 0 is the volatility of the bank reserves, and compared to the original model introduced in (Carmona et al. 2015), these coefficients may depend on the common factor process I. Each bank i can control its rate of borrowing/lending to a central bank via the control $${\alpha _{t}^{i}}$$ in order to minimize

$$\begin{array}{@{}rcl@{}} J^{i}\left(\alpha^{1},\ldots,\alpha^{n}\right) &=& \mathbb{E} \left[ {\int_{0}^{T}} f_{t}\left({X_{t}^{i}}, \frac{1}{n} \sum_{j=1}^{n} {X_{t}^{j}},{\alpha_{t}^{i}}\right) dt + g\left({X_{T}^{i}},\frac{1}{n} \sum_{j=1}^{n} {X_{t}^{j}}\right) \right], \end{array}$$

where

$$\begin{array}{@{}rcl@{}} f_{t}(x,\bar x,a) \; = \; \frac{1}{2} a^{2} - q(I_{t}) a(x-\bar x) + \frac{\eta(I_{t})}{2}(x-\bar x)^{2}, & & g(x,\bar x) \; = \; \frac{c}{2} (x-\bar x)^{2}. \end{array}$$

Here q(I t )> 0 is a positive $$\mathbb {F}^{0}$$-adapted process for the incentive to borrowing ($${\alpha _{t}^{i}} >$$ 0) or lending ($${\alpha _{t}^{i}} <$$ 0), η(I t )> 0 is a positive $$\mathbb {F}^{0}$$-adapted process, c > 0 is a positive $${{\mathcal {F}}_{T}^{0}}$$-measurable random variable, for penalizing departure from the average, and these coefficients may depend on the random factor. For this n-player stochastic differential game, one looks for cooperative equilibriums by taking the point of view of a center of decision (or social planner), which decides on the strategies for all banks, with the goal of minimizing the global cost to the collective. More precisely, given the symmetry of the set-up, when the social planner chooses the same control policy for all the banks in feedback form: $${\alpha _{t}^{i}} = \tilde \alpha \left (t,{X_{t}^{i}},\frac {1}{n}\sum _{j=1}^{n} {X_{t}^{j}},I_{t}\right)$$, i =1,…,n, for some deterministic function $$\tilde \alpha$$ depending upon time, private state of bank i, the empirical mean of all banks, and factor I, then the theory of propagation of chaos implies that, in the limit n, the log-monetary reserve processes X i become asymptotically independent conditionally on the random environment W 0, and the empirical mean $$\frac {1}{n}\sum _{j=1}^{n} {X_{t}^{j}}$$ converges to the conditional mean $$\mathbb {E}[X_{t}|W^{0}]$$ of X t given W 0, and X is governed by the conditional McKean-Vlasov equation:

$$\begin{array}{@{}rcl@{}} {dX}_{t} &=& \left[ \kappa(I_{t})\left(\mathbb{E}\left[X_{t} | W^{0}\right] - X_{t}\right) + \tilde\alpha\left(t,X_{t},\mathbb{E}\left[X_{t}|W^{0}\right],I_{t}\right) \right] dt \\ & & \;\;\; + \; \sigma(I_{t})\left(\sqrt{1-\rho^{2}(I_{t})} {dW}_{t} + \rho(I_{t}) d{W_{t}^{0}}\right), \; X_{0} \; = \; x_{0} \in \mathbb{R}, \end{array}$$

for some Brownian motion W independent of W 0. More generally, the representative bank can control its rate of borrowing/lending via a random field $$\mathbb {F}^{0}$$-adapted process $$\alpha = \{\alpha _{t}(x),x\in \mathbb {R}\}$$, leading to the log-monetary reserve dynamics:

$$\begin{array}{@{}rcl@{}} {dX}_{t} &=& \left[ \kappa(I_{t})\left(\mathbb{E}\left[X_{t} | W^{0}\right] - X_{t}\right) + \alpha_{t}(X_{t}) \right] dt \\ & & \;\;\; + \; \sigma(I_{t})\left(\sqrt{1-\rho^{2}(I_{t})} {dB}_{t} + \rho(I_{t}) d{W_{t}^{0}}\right), \; X_{0} \; = \; x_{0} \in \mathbb{R}, \end{array}$$
(34)

and the objective is to minimize over α

$$\begin{array}{@{}rcl@{}} J(\alpha) &=& \mathbb{E} \left[ {\int_{0}^{T}} f_{t}\left(X_{t},\mathbb{E}\left[X_{t}|W^{0}\right],\alpha_{t}(X_{t})\right) dt + g\left(X_{T},\mathbb{E}\left[X_{T}|W^{0}\right]\right) \right]. \end{array}$$

After square completion, we can rewrite the cost functional as

$$\begin{array}{@{}rcl@{}} J(\alpha) &=&\! \mathbb{E} \left[ {\int_{0}^{T}}\! \!\left(\! \frac{1}{2} \bar\alpha_{t}(X_{t})^{2} \,+\, \frac{(\eta-q^{2})(I_{t})}{2} \left(\mathbb{E}\left[\!X_{t}| W^{0}\right] \,-\, X_{t}\!\right)^{2}\! \right) dt \,+\, \frac{c}{2}\! \left(\! \mathbb{E}\left[X_{T} | W^{0}\right] \,-\, X_{T}\right)^{2} \right], \end{array}$$

with $$\bar \alpha _{t}(X_{t}) = \alpha _{t}(X_{t}) - q(\mathbb {E}[X_{t}|W^{0}]-X_{t})$$. Assuming that q 2η, this model fits into the LQCMKV2 problem, and more specifically into the case (2) of the discussion following Proposition 3.2. The optimal control is then given from (24) by

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*}(x) &=& -\left(2K_{t} +q(I_{t})\right) \left(x - \mathbb{E}\left[X_{t}^{*}|W^{0}\right]\right) - 2 \Lambda_{t} \mathbb{E}\left[X_{t}^{*}|W^{0}\right] - Y_{t}, \; \end{array}$$
(35)

where X is the optimal log-monetary reserve in (34) controlled by α , K is the solution to the BSRE:

$$\begin{array}{@{}rcl@{}} {dK}_{t} &=&\! \left[\! 2(\kappa + q)(I_{t}) K_{t} \,-\, 2 {K_{t}^{2}} \,-\, \frac{1}{2} \left(\eta \,-\, q^{2}\right)(I_{t}) \right]\! dt \,+\, {Z_{t}^{K}} d{W_{t}^{0}}, \;\; 0 \leq t \leq T, \; K_{T} = \frac{c}{2}, \end{array}$$

Λ is the solution to the BSRE

$$\begin{array}{@{}rcl@{}} d\Lambda_{t} &=& 2 {\Lambda_{t}^{2}} dt + Z_{t}^{\Lambda} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; \Lambda_{T} = 0, \end{array}$$

and Y is the solution to the linear BSDE

$$\begin{array}{@{}rcl@{}} {dY}_{t} &=& \left[ 2 \Lambda_{t} Y_{t} - 2 \sigma(I_{t}) \rho(I_{t}) {Z_{t}^{Y}}\right] dt + {Z_{t}^{Y}} d{W_{t}^{0}}, \;\;\; 0 \leq t \leq T, \; Y_{T} = 0. \end{array}$$

The nonnegative solution K to the BSRE is, in general, not explicit, while the solution for (Λ,Y) is obviously equal to Λ≡ 0 ≡Y. From (35), it is then clear that $$\mathbb {E}[\alpha _{t}^{*}(X_{t}^{*})|W^{0}] =$$ 0, so that the conditional mean of the optimal log-monetary reserve is governed from (34) by

$$\begin{array}{@{}rcl@{}} d \mathbb{E}[X_{t}^{*}|W^{0}] &=& \sigma(I_{t}) \rho(I_{t}) d{W_{t}^{0}}. \end{array}$$

The optimal control can then be expressed pathwise as

$$\begin{array}{@{}rcl@{}} \alpha_{t}^{*}(X_{t}^{*}) &=& - (2K_{t} +q(I_{t}))\left(X_{t}^{*} - x_{0} - {\int_{0}^{t}} \sigma(I_{s}) \rho(I_{s}) d{W_{s}^{0}}\right), \;\;\; 0 \leq t \leq T. \end{array}$$

## References

1. Alfonsi, A, Fruth, A, Schied, A: Optimal execution strategies in limit order books with general shape functions. Quantit. Finance. 10, 143–157 (2010).

2. Almgren, R, Chriss, N: Optimal execution of portfolio transactions. J. Risk. 3, 5–39 (2000).

3. Almgren, R, Li, TM: Market microstructure and liquidity,. 2(1) (2016).

4. Andersson, D, Djehiche, B: A maximum principle for SDEs of mean-field type. Appl. Math. Optimization. 63, 341–356 (2010).

5. Bain, A, Crisan, D: Fundamentals of stochastic filtering, Series Stochastic Modelling and Applied Probability, Vol. 60. Springer, New York (2009).

6. Bank, P, Soner, M, Voss, M: Hedging with transient price impact. arXiv: 1510.03223v1, to appear in Mathematics and Financial Economics (2015). http://link.springer.com/article/10.1007/s11579-016-0178-4.

7. Basak, S, Chabakauri, G: Dynamic mean-variance asset allocation. Rev. Finan. Stud. 23, 2970–3016 (2010).

8. Bensoussan, A, Frehse, J, Yam, P: Mean Field Games and Mean Field Type Control Theory. Springer (2013).

9. Bismut, JM: Linear quadratic optimal stochastic control with random coefficients. SIAM J. Control Optim. 14, 419–444 (1976).

10. Borkar, V, Kumar, KS: McKean-Vlasov limit in portfolio optimization. Stoch. Anal. Appl. 28, 884–906 (2010).

11. Buckdahn, R, Djehiche, B, Li, J: A general maximum principle for SDEs of mean-field type. Appl. Math. Optim. 64(2), 197–216 (2011).

12. Buckdahn, R, Li, J, Peng, S, Rainer, C: Mean-field stochastic differential equations and associated PDEs (2014). http://arxiv.org/abs/1407.1215, to appear on Annals of Probability, preprint available at http://www.imstat.org/aop/future_papers.htm.

13. Cai, J, Rosenbaum, M, Tankov, P: Asymptotic lower bounds for optimal tracking: a linear programming approach (2015). arXiv:1510.04295.

14. Cardaliaguet, P: Notes on mean field games. Notes from P.L. Lions lectures at Collège de France (2012). https://www.ceremade.dauphine.fr/cardalia/MFG100629.pdf.

15. Carmona, R, Delarue, F: The Master equation for large population equilibriums. In: Crisan, D, et al. (eds.)Stochastic Analysis and Applications 2014, Springer Proceedings in Mathematics and Statistics 100. Springer (2014).

16. Carmona, R, Delarue, F: Forward-backward Stochastic Differential Equations and Controlled McKean Vlasov Dynamics. Ann. Probab. 43(5), 2647–2700 (2015).

17. Carmona, R, Zhu, X: A probabilistic approach to mean field games with major and minor players. Ann. Appl. Prob. 26(3), 1535–1580 (2016). arXiv: 1409.7141v1.

18. Carmona, R, Delarue, F, Lachapelle, A: Control of McKean-Vlasov dynamics versus mean field games. Math. Financial Econ. 7, 131–166 (2013).

19. Carmona, R, Fouque, JP, Sun, LH: Mean field games and systemic risk. to appear in Communications in Mathematical Sciences. Communications in Mathematical Sciences. 13(4), 911–933 (2015).

20. Cartea, A, Jaimungal, S: A closed-form execution strategy to target VWAP. to appear in SIAM Journal of Financial Mathematics (2015). Preprint available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2542314.

21. Chassagneux, JF, Crisan, D, Delarue, F: A probabilistic approach to classical solutions of the master equation for large population equilibria (2015). arXiv: 1411.3009.

22. El Karoui, N: Les aspects probabilistes du contrôle stochastique. Ninth Saint Flour Probability Summer School-1979. Lecture Notes Math 876, 73–238 (1981), Springer.

23. Frei, C, Westray, N: Optimal execution of a VWAP order: a stochastic control approach. Math. Finance. 25, 612–639 (2015).

24. Hu, Y, Jin, H, Zhou, XY: Time-inconsistent stochastic linear-quadratic control. SIAM J. Control Optim. 50, 1548–1572 (2012).

25. Huang, J, Li, X, Yong, J: A linear-quadratic optimal control problem for mean-field stochastic differential equations in infinite horizon. Math. Control Related Fields. 5, 97–139 (2015).

26. Kunita, H: Ecole d’Eté de Probabilités de Saint-Flour XII. Springer-Verlag, Berlin, New York (1982).

27. Li, D, Zhou, XY: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. Appl. Math. Optim. 42, 19–33 (2000).

28. Li, X, Sun, J, Yong, J: Mean-Field Stochastic Linear Quadratic Optimal Control Problems: Closed-Loop Solvability (2016). arXiv: 1602.07825.

29. Lions, PL: Cours au Collège de France: Théorie des jeux à champ moyens, audio conference 2006–2012 (2012).

30. Peng, S: Stochastic Hamilton Jacobi Bellman equations. SIAM J. Control Optim. 30, 284–304 (1992).

31. Pham, H, Wei, X: Bellman equation and viscosity solutions for mean-field stochastic control problem (2015). arXiv:1512.07866v2.

32. Pham, H, Wei, X: Dynamic programming for optimal control of stochastic McKean-Vlasov dynamics (2016). arXiv: 1604. 04057.

33. Predoiu, S, Shaikhet, G, Shreve, S: Optimal execution in a general one-sided limit-order book. SIAM J. Financial Math. 2, 183–212 (2011).

34. Rogers, LCG, Singh, S: The cost of illiquidity and its effects on hedging. Mathematical Finance. 20, 597–615 (2010).

35. Sun, J: Mean-Field Stochastic Linear Quadratic Optimal Control Problems: Open-Loop Solvabilities (2015). arXiv: 1509.02100v2.

36. Sun, J, Yong, J: Linear Quadratic Stochastic Differential Games: Open-Loop and Closed-Loop Saddle Points. SIAM J. Control Optim. 52, 4082–4121 (2014).

37. Tang, S: General linear quadratic optimal stochastic control problems with random coefficients: linear stochastic Hamilton systems and backward stochastic Riccati equations. SIAM J. Control Optim. 42, 53–75 (2003).

38. Wonham, W: On a matrix Riccati equation of stochastic control. SIAM J. Control. 6, 681–697 (1968).

39. Yong, J: A linear-quadratic optimal control problem for mean-field stochastic differential equations. SIAM J. Control Optim. 51(4), 2809–2838 (2013).

40. Yong, J, Zhou, XY: Stochastic controls. Hamiltonian systems and HJB equations. Springer, New York (1999).

## Acknowledgments

This work is part of the ANR project CAESARS (ANR-15-CE05-0024), and also supported by FiME (Finance for Energy Market Research Centre) and the “Finance et Développement Durable - Approches Quantitatives” EDF - CACIB Chair.

### Competing interests

The author declares that he has no competing interests.

## Author information

Authors

### Corresponding author

Correspondence to Huyên Pham.

## Rights and permissions 