- Research
- Open access
- Published:
On approximation of BSDE and multi-step MLE-processes
Probability, Uncertainty and Quantitative Risk volume 1, Article number: 4 (2016)
Abstract
We consider the problem of approximation of the solution of the backward stochastic differential equations in Markovian case. We suppose that the forward equation depends on some unknown finite-dimensional parameter. This approximation is based on the solution of the partial differential equations and multi-step estimator-processes of the unknown parameter. As the model of observations of the forward equation we take a diffusion process with small volatility. First we establish a lower bound on the errors of all approximations and then we propose an approximation which is asymptotically efficient in the sense of this bound. The obtained results are illustrated on the example of the Black and Scholes model.
Introduction
We consider the problem of approximation of the solution of the backward stochastic differential equation (BSDE) in the so-called Markovian case. Let us recall some basics of BSDEs. We are given a stochastic differential equation (called forward)
where S(t,x) is the drift coefficient, σ(t,x)2 is the diffusion coefficient, and W t ,0≤t≤T is a standard Wiener process. In addition, we have two functions f(t,x,y,z) and Φ(x) and we must construct such couple of processes (Y t ,Z t ,0≤t≤T) that the solution of the equation
(called backward) has the final value Y T =Φ(X T ). Such BSDEs were first introduced by Bismut in 1973 (Bismut 1973) in the linear case and the general theory was developed by Pardoux and Peng (1990) (Pardoux and Peng 1990). The Markovian case considered in this work was studied by Pardoux and Peng (Pardoux and Peng 1992), see Section 4 in El Karoui et al. (1997) as well. This model is also called forward-backward stochastic differential equation (FBSDE) (El Karoui et al. 1997).
The construction of the backward equation is realized as follows. Suppose that u(t,x) satisfies the parabolic partial differential equation
with the final condition u(T,x)=Φ(x). Let us let Y t =u(t,X t ), and Z t =σ(t,X t )u x′(t,X t ). Then, by Itô’s formula
The final value Y T =u(T,X T )=Φ(X T ). Therefore, if we have the solution u(t,x), then we immediately obtain the BSDE.
We are interested by the problem of approximation of (Y t ,Z t ,0≤t≤T) in the situation, where the forward equation contains some unknown finite-dimensional parameter 𝜗:
Then the solution of the PDE u=u(t,x,𝜗). We cannot simply let Y t =u(t,X t ,𝜗) because we do not know 𝜗. Of course, the natural way to approximate Y t and Z t is to estimate first the unknown parameter 𝜗 with the help of some estimator \(\bar {\vartheta }\) and then to put, say, \({\bar {Y}}_{t}=u(t,X_{t},{\bar {\vartheta }}) \). We can guess that if \(\bar {\vartheta }\) is a good estimator of 𝜗, then \({\bar {Y}}_{t}\) will be a good estimator of Y t . There are several problems, that are interesting to study in this framework. We must understand what the conditions imposed on the estimator \(\bar {\vartheta }\) that allow us to say that it is good. We consider that a good estimator has the following properties.
-
1.
To estimate Y t we need an estimator, which is constructed by the first observations of the solution of the forward equation up to time t, i.e., \({\bar {\vartheta }}_{t} = {\bar {\vartheta }}_{t} \left (X_{s},0\leq s\leq t\right)\), 0<t≤T.
-
2.
As we need such estimator for all t∈(0,T]we suppose that its calculation must be relatively simple.
-
3.
The error of estimation, say, \({\mathbf {E}}_{\vartheta _{0}}\left ({\bar {\vartheta }}_{t}-\vartheta _{0}\right)^{2}\) must be as minimal as possible.
Therefore \(\bar {\vartheta }\) is an estimator-process \({\bar {\vartheta }} =\left ({\bar {\vartheta }}_{t},0<t\leq T\right) \). Of course, the construction of such estimator-process is an intermediate problem. The main problem is to obtain a good approximations of Y t and Z t . In particular, we must show that the approximations
are in some sense asymptotically optimal, i.e., it is impossible to have approximations of these processes with asymptotic errors smaller than that of \({\bar {Y}}_{t} \) and \({\bar {Z}}_{t}\).
The goal of the study initiated in Kutoyants and Zhou (2014) is to realize such a program for three models of observations of the forward equation. As is usual in statistics, we consider situations where it is possible to have a consistent estimation of the unknown parameters and processes. Therefore, we are interested by the following well-known models of observations.
-
Diffusion process with an unknown parameter in the drift coefficient and small noise or small volatility
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,t,X_{t}\right)\mathrm{d}t+\varepsilon \sigma \left(t,X_{t}\right)\,\mathrm{d}W_{t}, \quad x_{0}, \; 0\leq t\leq T. \end{array} $$(1)Here the time T of observations X T=(X t ,0≤t≤T) is fixed and the limit corresponds to ε→0.
-
Diffusion process
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(t,X_{t}\right)\mathrm{d}t+ \sigma \left(\vartheta,t,X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T, \end{array} $$(2)observed in the discrete times \(X^{n}=\left (X_{t_{0}},X_{t_{1}},\ldots X_{t_{n}}\right)\), \(t_{i}=i\frac {T}{n}\). Here the unknown parameter is in the volatility coefficient and the limit corresponds to n→∞ (high frequency model of observations). The time T of observations is fixed.
-
Ergodic diffusion process
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,X_{t}\right)\mathrm{d}t+\sigma \left(X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T. \end{array} $$(3)Here the unknown parameter 𝜗 is in the drift coefficient, we have continuous time observations X T=(X t ,0≤t≤T) and the limit is T→∞.
Of course there are other possible statements. For example, it can be considered the mixture of discrete time and ergodic diffusion. This corresponds to the equation
and observations \(X^{n}=\left (X_{t_{0}},X_{t_{1}},\ldots X_{t_{n}}\right)\). Here maxi|t i −t i−1|→0 and T n →∞. Such a model of parameter estimation was studied, e.g., in Kamatani and Uchida (2015); Uchida and Yoshida (2014). It is possible to consider the mixture of discrete-time and small noise models, to consider the model with X t →±∞ or the models with null recurrent forward equation etc. It will be interesting to see the statements of the statistical problems in non-Markovian cases for more general models.
Let us decribe the general framework of the statistical study of the above mentioned three models (1)-(3). For each model we propose an estimator-processes \(\vartheta _{t}^{\star },0< t\leq T\) such that \( Y^{\star }_{t}=u(t,X_{t},\vartheta _{t}^{*})\rightarrow Y_{t} \) and the error of approximation \({\mathbf {E}}_{\vartheta } \left (Y^{\star }_{t}-Y_{t} \right)^{2}\) is asymptotically minimal. In the earlier works Kutoyants and Zhou (2014); Gasparyan and Kutoyants (2015); (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation) (see the review of these works in Kutoyants (2014)) we considered the approximation of the solution of BSDEs with a learning interval of fixed length.
The optimality of estimators of Y t and Z t is understoud as follows. We define for each model a normalization function φ→0, i.e., φ ε →0 as ε→0, φ n →0 as n→∞, and φ T →0 as T→∞.
We propose the lower bounds on the risks of all estimators
which allow us to define the asymptotically efficient estimators \(Y_{t}^{\star } \) of Y t as follows
We suppose that the last equality takes place for all 𝜗 0∈Θ and all t∈(0,T]. We also have a similar bound in the problem of estimation of Z t . For models (1) and (3) these bounds are slight modifications of the Hajek-Le Cam lower bound (Ibragimov and Has’minskii 1981) and for model (2) the lower bound is similar to Jeganathan’s lower bound (Jeganathan 1983).
We take the quadratic loss function just for simplicity of exposition. For all mentioned models, the similar lower bounds and corresponding estimator processes can be proved for more general loss functions.
The approximation of the solution of BSDEs in the Markovian case were initiated in the work Kutoyants and Zhou (2014), where the model of small volatility was considered. The parameter 𝜗 was supposed one-dimensional and the approximation-process \(Y_{t,\varepsilon }^{\star } \) was defined for t∈[τ,T], where τ>0 is a fixed value.
In the work Gasparyan and Kutoyants (2015), we considered the model of discrete-time observations (2) and the one-step MLE-process which allowed us to construct an estimator-process \(Y_{t_{k},n}^{\star }\) for the values t k,n ∈[τ,T], where τ>0 is fixed.
The case of ergodic diffusion process is considered in the work (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation), which is still in progress.
The main contribution of the present work is due to a new class of estimator-processes called multi-step MLE-process introduced in Kutoyants (2015). These estimator-processes allow us to construct the approximations of the solutions of BSDEs for three above mentioned models with vanishing learning intervals (models (1) and (2)) or negligible with respect to the whole volume of observations learning interval (model (3)). Here we consider the model (1) only. The models (2) and (3) we leave to study later.
In the present work, we consider the small volatility model where we suppose that the unknown parameter is multi-dimensional and the approximation process \(Y_{t,\varepsilon }^{\star } \) we define for t∈[τ ε ,T], where τ ε →0. This approximation allows us to consider the case τ ε =ε δ→0 and, moreover, to choose δ close to 2. The relations between the choice of δ and the multi-step MLE-processes are the following. If δ∈(0,1), then we use the one-step MLE-process \(\vartheta _{t,\varepsilon }^{\star }\); if \(\delta \in [1,\frac {4}{3})\), then we use the two-step MLE-process \(\vartheta _{t,\varepsilon }^{\star \star }\); if \(\delta \in [\frac {4}{3}, \frac {3}{2})\), then we use the three-step MLE-process \(\vartheta _{t,\varepsilon }^{\star \star \star }\).
In the work Kutoyants (2015) we aleady studied the multi-step MLE-process for ergodic diffusion process, and the structure of estimator-process proposed in the present work is quite similar.
Note that the multi-step, like the well-known one-step ML-estimators, are based on the so-called Fisher-score device proposed by Fisher in 1925 (Fisher 1925) and studied by Le Cam in 1956 (Le Cam 1956). Let us recall this construction. Suppose that we have n i.i.d. r.v.’s X n=(X 1,…,X n ) with smooth density function f(𝜗,x) and denote ℓ(𝜗,x)= lnf(𝜗,x). The maximum likelihood equation is
Here and in the rest of the paper dot means derivation w.r.t. 𝜗. If we expand it at the vicinity of the true value 𝜗 0, we obtain
Therefore
Note that
where \({\mathbb {I}}\left (\vartheta _{0}\right) \) is the Fisher information.
Suppose that we have a preliminary estimator \({\bar {\vartheta }}_{n}\) such that
Keeping in mind the relations (4)-(5), the one-step MLE \(\vartheta _{n}^{\star }\) is defined as follows
This estimator is already asymptotically efficient because its limit variance is I(𝜗 0)−1:
Therefore, this Fisher-score device allows us to improve the preliminary estimator up to asymptotically efficient (see details, e.g., in Lehmann and Romano (2005)).
Moreover, this device can be applied even in the case of preliminary estimator with the rate of convergence worse than \(\sqrt {n}\) (see, e.g., Robinson (1988); Kamatani and Uchida (2015)). For continuous-time stochastic processes such a construction was used, for example, in Skorohod and Khasminskii (1996).
The one-step MLE-process, introduced in Kutoyants (2015), for this model of observations can be written as follows. Let us denote \({\bar {\vartheta }}_{n}\) the premilinary estimator constructed by the first N=[n δ] observations X N=(X 1,…,X N ) with \(\delta \in (\frac {1}{2},1)\). Then the one-step MLE-process \(\vartheta _{n}^{\star }=\left (\vartheta _{k,n}^{\star }, N+1\leq k\leq n\right)\) is defined by the equality
and for k=[s n],s∈(0,1] we have the convergence
Here s is fixed and n→∞. Therefore \(\vartheta _{n}^{\star } \) is a good estimator, i.e., \(\vartheta _{k,n}^{\star } \) depends on X k=(X 1,…,X k ), is easy to calculate and is asymptotically efficient because it is asymptotically equivalent to the MLE. For the details see Kutoyants and Motrunich (2016).
The one-step MLE-process in the case of ergodic diffusion forward Eq. 3 can be illustrated as follows. Suppose that we have a preliminary estimator \({\bar {\vartheta }}_{T^{\delta }} \) constructed by the observations \(X^{T^{\delta } }=\left (X_{t}, 0 \leq t\leq T^{\delta }\right)\) with \(\delta \in (\frac {1}{2},1]\). Then the one-step MLE-process \(\vartheta ^{\star }_{t,T}, T^{\delta } <t\leq T\) based on the Fisher-score device (4), (6) has the following form
This estimator-process is asymptotically efficient (t=r T;r∈(0,1])
(see Kutoyants (2015)) and provides asymptotically efficient estimator-processes
of the solution (Y t ,Z t ) of the BSDE.
Forward equation with small volatility
We are given the function f(t,x,y,z) defined on \(\left [0,T\right ]\times {\mathcal {R}}^{k}\times {\mathcal {R}}\times {\mathcal {R}}^{k}\), function \(\Phi (x), x\in {\mathcal {R}}^{k}\) and k-dimensional diffusion process (forward)
Here \(\vartheta \in \Theta \subset {\mathcal {R}}^{d}\), Θ is an open bounded set and \(W_{t}=\left ({W^{1}_{t}},\ldots,{W^{k}_{t}}\right),0\leq t\leq T\) is a standard k-dimensional Wiener process.
Introduce the condition \({\mathfrak L}\).
The functions f(t,x,y,z), Φ(x), vector S(𝜗,t,x)=(S l (𝜗,t,x), l=1,…,k)and k×k matrix σ(t,x)=(σ lm (t,x))are smooth
and satisfy (p>0)
We must find a couple of stochastic processes \(\left (X_{t,\varepsilon }^{\star }, Z_{t,\varepsilon }^{\star },0\leq t\leq T\right)\) which approximate well the solution of the BSDE
satisfying the condition Y T =Φ(X T ).
Let us denote \( x_{t}\left (\vartheta \right)=\left (x_{t}^{(1)}(\vartheta),\ldots, x_{t}^{(k)}(\vartheta)\right),0\leq t\leq T\) the solution of the system of ordinary differential equations
The true value is 𝜗 0 and we let x t =x t (𝜗 0). We have the estimates: with probability 1
and for any p>0
For the proof see, e.g., Kutoyants (1994).
We have a family of problems of parameter estimation by observations X t=(X s ,0≤s≤t), where t∈(0,T] and therefore we need a family of estimators \({\bar {\vartheta }}_{t,\varepsilon }, 0<t\leq T\). Let \(\left ({\mathcal {C}}^{k}\left (\left [0,t\right ]\right),{\mathfrak B}_{t}\right)\) be a measurable space of continuous vector-functions on [0,t] with Borelian σ-algebra \({\mathfrak B}_{t} \). Denote by \(\left \{\mathbf {P}_{\vartheta }^{\left (\varepsilon,t \right)},\vartheta \in \Theta \right \} \) the family of measures induced in this space by the solutions of (9) with different 𝜗∈Θ. Note that these measures are equivalent (see Liptser and Shiryaev (2001)) and the likelihood ratio function is
Here \(\mathbf {P}_{0 }^{\left (\varepsilon,t \right)} \) is the measure, which corresponds to the observations (9) with S(𝜗,t,X t )≡0. The matrix \(\mathbb {A}\left (s,x\right)\) is
Recall that the MLE \({\hat {\vartheta }}_{\varepsilon,t}\) is defined by the equation
Introduce the Regularity conditions \({\mathfrak R}\).
-
1.
The function S(𝜗,t,x)is two-times continuously differentiable w.r.t. 𝜗 and the derivatives are Lipschitz in x.
-
2.
We suppose that there exists a positive constant m such that for any real \(\lambda \in {\mathcal {R}}^{k}\) we have
$$\begin{array}{@{}rcl@{}} m^{-1}\left\|\lambda \right\|^{2}\leq \lambda^{*} \mathbb{A}\left(s,x\right)\lambda \leq m\left\|\lambda \right\|^{2}. \end{array} $$(14) -
3.
The Fisher information matrix
$$\begin{array}{@{}rcl@{}} \mathbb{I}_{t}(\vartheta)={{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta,s,x_{s}(\vartheta)\right)^{*}\mathbb{A}\left(s,x_{s}(\vartheta)\right)^{-1}\dot{\mathbb{S}}\left(\vartheta,s,x_{s}(\vartheta)\right)\,\mathrm{d}s, \end{array} $$is uniformly nondegenerate:
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta \in\Theta }\inf\limits_{\left|\lambda \right|=1} \lambda^{*}\mathbb{I}_{t}(\vartheta)\lambda >0 \end{array} $$Here \(\lambda \in {\mathcal {R}}^{d}\) dot means derivation w.r.t. 𝜗 and \(\dot {\mathbb {S}}\left (\vartheta,s,x\right) \) is k×d-matrix.
-
4.
Identifiability condition : for any ν>0and any t∈(0,T]the estimate
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta_{0}\in\Theta }\inf\limits_{\left|\vartheta -\vartheta_{0}\right|>\nu }{{\int\nolimits}_{0}^{t}}\delta \left(s,x_{s},\vartheta,\vartheta_{0}\right)^{*}\mathbb{A}\left(s,x_{s}\right)^{-1}\delta \left(s,x_{s},\vartheta,\vartheta_{0}\right)\mathrm{d}s>0 \end{array} $$holds. Here δ(s,x s ,𝜗,𝜗 0)=S(𝜗,s,x s )−S(𝜗 0,s,x s ).
The Regularity conditions allow us to prove the folowing properties of the MLE \({\hat {\vartheta }}_{\varepsilon,t}, t\in (0,T]\).
-
1.
It is uniformly consistent: for any ν>0and any compact K⊂Θ
$$\begin{array}{@{}rcl@{}} \lim\limits_{\varepsilon \rightarrow 0}\sup\limits_{\vartheta_{0}\in \textbf{K}} \mathbf{P}_{\vartheta_{0} }^{\left(\varepsilon,t \right)}\left(\left|{\hat{\vartheta}}_{\varepsilon,t}-\vartheta_{0} \right|>\nu \right)=0. \end{array} $$ -
2.
Uniformly on compacts K⊂Θ asymptotically normal
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left({\hat{\vartheta}}_{\varepsilon,t}-\vartheta_{0} \right)\Longrightarrow {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta_{0} \right)^{-1}\right). \end{array} $$ -
3.
The polynomial moments converge and it is asymptotically efficient.
These properties were established in Kutoyants (1994) in the case of the one-dimensional diffusion processes (9). There is no essential dificulties to apply the same proof in our case. The presented below multi-step MLE-processes have exactly the same asymptotic properties, but can be calculated more easily.
Introduce the family of functions
such that for all 𝜗∈Θ the function u(t,x,𝜗) satisfies the PDE
and the condition \(u(T,x,\vartheta)=\Phi (x), x\in {\mathcal {R}}^{k}\).
The limit of the function u(t,x,𝜗) as ε→0 we denote as u ∘(t,x,𝜗). The function u ∘(t,x,𝜗) satisfies the equation
with the final value u ∘(T,x,𝜗)=Φ(x). Below \(\dot u_{\circ }\left (t,x,\vartheta \right) \) and \(\ddot u_{\circ }\left (t,x,\vartheta \right) \) means the derivative of this function w.r.t. 𝜗.
Introduce the condition \({\mathfrak U}\)
-
1.
The function u(t,x,𝜗)is two-times continuously differentiable w.r.t. 𝜗 and the derivatives \(\dot u\left (t,x,\vartheta \right)\) and \(\ddot u\left (t,x,\vartheta \right)\) are Lipschitz w.r.t. x unifoormly in 𝜗∈Θ.
-
2.
The function u(t,x,𝜗)and its derivatives \(\dot u\left (t,x,\vartheta \right) \) and \(\dot u'_{x}\left (t,x,\vartheta \right) \) converge uniformly in t∈[0,T]to \(u_{\circ }\left (t,x,\vartheta \right),\dot u_{\circ }\left (t,x,\vartheta \right),\dot u_{\circ,x}'\left (t,x,\vartheta \right)\) respectively.
The sufficient conditions providing these properties of u(t,x,𝜗) can be found in Freidlin and Wentzell (1998), Theorem 2.3.1. Note that the derivatives \(\dot u\left (t,x,\vartheta \right)\) and \(\ddot u\left (t,x,\vartheta \right)\) satisfy the linear PDE of the same type.
If we let Y t =u(t,X t ,𝜗), then by Itô’s formula we obtain BSDE (10) with
Recall that our goal is to construct an asymptotically efficient approximation of the couple (Y t ,Z t ). To compare all possible estimators we introduce the lower bounds on the mean-square risks. This is a version of the well-known Hajek-Le Cam minimax risk bound (see, e.g., Ibragimov and Has’minskii (1981), Theorem 2.12.1).
Theorem 1
Suppose that the conditions \({\mathfrak L}, {\mathfrak R}\) and \( {\mathfrak U}\) are fulfilled. Then for all estimators \({\bar {Y}}_{t}\) and \({\bar {Z}}_{t}\) and all t∈(0,T] we have the relations
Proof
We first verify that the family of measures is locally asymptotically normal (LAN) and then we apply the proof of the Hajek-Le Cam lower bound (Ibragimov and Has’minskii 1981), which provides us (15), (16). We present here the necessary modification of the proof given in Ibragimov and Has’minskii (1981). Usually this inequality is considered for the risk like \({\mathbf {E}}_{\vartheta } \left |{\bar {\vartheta }}_{\varepsilon } -\vartheta \right |^{2}\) and we are interested by the risk \({\mathbf {E}}_{\vartheta } \left |{\bar {Y}}_{t,\varepsilon } -Y_{t} \right |^{2}\), where Y t is a random process. Another point, the random vector Δ t (see below), in general, is asymptotically normal and, in our case, it has Gaussian distribution, that is why the proof is slightly simplified. □
Let us denote \(\varphi _{\varepsilon } =\varepsilon {\mathbb {I}_{t}}^{-\frac {1}{2}} \) where \({\mathbb {I}_{t}}={\mathbb {I}_{t}}\left (\vartheta _{0}\right)\) and introduce the normalized likelihood ratio
We can write
where r ε →0 and the vector
Here \(\mathbb {J}\) is a unit d×d matrix.
Hence, the family of measures \(\left \{\mathbf {P}_{\vartheta }^{\left (\varepsilon,t \right)},\vartheta \in \Theta \right \} \) is LAN in Θ (Ibragimov and Has’minskii 1981; Kutoyants 1994).
Below, M>0, \({\mathcal {K}}_{M}\) is a cube in \({\mathcal {R}}^{d}\) whose vertices have coordinates ±M, so that its volume is (2M)d and 𝜗 v =𝜗 0+φ ε v.
We have
where |h ε |≤C ε. Hence, if we denote
and introduce such vector \({\bar {v}}_{\varepsilon } \) that \({\bar {b}}_{\varepsilon } = \dot u\left (t,X_{t},\vartheta _{0} \right)^{*}{\mathbb {I}}_{t}^{-\frac {1}{2}}{\bar {v}}_{\varepsilon } \), then we can write
Further, we use the following result known as Scheffé’s lemma
Lemma 1
Let the random variables Z ε ≥0,ε∈(0,1] converge in probability to the random variable Z≥0 as ε→0 and E Z ε =E Z=1, then
For the proof see, e.g., Theorem A.4 in Ibragimov and Has’minskii (1981).
Recall that \({\mathbf {E}}_{\vartheta _{0}} Z_{t,\varepsilon }(v)={\mathbf {E}}_{\vartheta _{0}} Z_{t }(v)=1\), where \(\ln Z_{t }(v)=v^{*}\Delta _{t}-\frac {1}{2}\left |v\right |^{2} \). Hence for any K>0
Here we denoted \(\left |D\right |^{2}_{K}=\left |D\right |^{2}\wedge K\). These allow us to write
Then
and
where w=v−Δ t and \(\tilde w_{\varepsilon } = {\bar {v}}_{\varepsilon } -\Delta _{t}\). Introduce the set \({\mathcal {C}}_{M}\) such that each coordinate of \(\Delta _{t}=\left (\Delta _{t}^{(1)},\ldots,\Delta _{t}^{(d)} \right)\) is less than \(M-\sqrt {M}\), i.e., \(\left |\Delta _{t}^{(l)} \right |\leq M-\sqrt {M}\). Then
because \({\mathcal {K}}_{\sqrt {M}}\subset {\mathcal {C}}_{{M}} \). By Andersen’s Lemma (see, e.g., Ibragimov and Has’minskii (1981), Lemma 2.10.2)
Note that as M→∞ we obtain the limits
and
The last steps are ε→0 and K→∞
The detailed proof can be found in Ibragimov and Has’minskii (1981), Theorem 2.12.1.
Therefore the bound (15) is verified. The bound (16) is proved in a similar way. Note that Z t =ε u x′(t,X t ,𝜗)σ(t,X t ). An arbitrary estimator \({\bar {Z}}_{t}\) of Z t we write as \({\bar {Z}}_{t}=\varepsilon \tilde Z_{t}\). Then, for \(\varepsilon ^{-1}\left ({\bar {Z}}_{t}-Z_{t}\right)\) we follow the proof given above.
Definition
Suppose that the conditions \({\mathfrak L}, {\mathfrak R}, {\mathfrak U}\) are fulfilled. Then we call the estimator-processes \( Y_{t}^{*}, Z_{t}^{*}, 0<t\leq T\) asymptotically efficient if for all 𝜗 0∈Θ and all t∈(0,T] we have the equalities
As we do not know the value 𝜗 we propose first to estimate it using some estimator-process 𝜗 ε,t⋆,0<t≤T and then to put
Recall that formally the MLE-process \({\hat {\vartheta }}_{\varepsilon,t}, 0<t\leq T \) “solves” the problem and it can be shown that under the supposed regularity conditions the estimator-processes \({\hat {Y}}_{t,\varepsilon }=u(t,X_{t},{\hat {\vartheta }}_{\varepsilon,t}) \) and \({\hat {Z}}_{t,\varepsilon }=u'_{x}(t,X_{t},{\hat {\vartheta }}_{\varepsilon,t})\sigma \left (t,X_{t}\right) \) are asymptotically efficient in the sense of the relations (17) and (18), respectively, but this solution can not be called acceptable because the calculation of \({\hat {\vartheta }}_{\varepsilon,t}\) for all t∈(0,T], in the general case, is a computationally difficult problem. That is why we propose to use the so-called multi-step MLE-process (Kutoyants 2015), which is introduced as follows. First we construct a preliminary estimator \( {\bar {\vartheta }}_{\tau _{\varepsilon } }\) by the observations \(X^{\tau _{\varepsilon } }=\left (X_{s},0\leq s\leq \tau _{\varepsilon } \right)\) on some learning interval [0,τ ε ], where τ ε =ε δ with 0<δ<1 and then we propose an estimator-process \(\vartheta _{t,\varepsilon }^{\star }, \tau _{\varepsilon } \leq t\leq T\) based on this preliminary estimator. Finally we show that the corresponding estimators, say, \(Y_{t,\varepsilon }^{\star }=u\left (t,X_{t},\vartheta _{t,\varepsilon }^{\star }\right), \tau _{\varepsilon } \leq t\leq T\) are asymptotically efficient.
As a preliminary we propose the minimum distance estimator (MDE) \( {\bar {\vartheta }}_{\tau _{\varepsilon } }\) defined by the relation
Here the family of random processes \(\left \{\left ({\hat {X}}_{t}\left (\vartheta \right),0\leq t\leq \tau _{\varepsilon } \right),\vartheta \in \Theta \right \}\) is defined as follows
These estimators were studied in Kutoyants (1994) in the case of fixed τ ε =τ and are called the trajectory fitting estimators as well, because we choose an estimator \( {\bar {\vartheta }}_{\tau _{\varepsilon } } \), which provides a trajectory \({\hat {X}}_{t}\left ({\bar {\vartheta }}_{\tau _{\varepsilon } }\right),0\leq t\leq \tau _{\varepsilon } \) closest to the observations X t ,0≤t≤τ ε . It was shown that if the conditions of regularity and the condition of identifiability: for any ν>0
hold and the matrix
is uniformly nondegenerate (below \(\lambda \in {\mathcal {R}}^{d}\))
then the MDE is asymptotically normal
Note that if we have the Regularity condition 3 (identifiability) with T=τ, then the identifiability condition (19) is also fulfilled. Indeed, suppose that there exists 𝜗 1≠𝜗 0 such that
Then for all t∈[0,τ]
which implies
The last equality, of course, contradicts Regularity condition 3.
Now suppose that τ ε =ε δ with δ<1 and the matrix
is uniformly nondegenerate in 𝜗 0∈Θ (below \(\lambda \in {\mathcal {R}}^{d}\))
Then, we can obtain the asymptotics
Note that
and
Therefore, the family of random vectors \(\varepsilon ^{-1+\frac {\delta }{2}}\left ({\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}\right) \) is asymptotically normal. Moreover, following Kutoyants (1994) it can be shown that the moments are bounded, i.e.,
where the constant C=C(p)>0 does not depend on ε for all p>0.
Let us introduce the one-step MLE-process \(\vartheta _{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T \)
Its properties are described in the following proposition.
Proposition 1
Let the conditions \( {\mathfrak L}, {\mathfrak R}\) be fulfilled and δ∈(0,1), then for all t∈(0,T]
and this estimator-process is asymptotically efficient. Moreover, we have the uniform consistency, i.e., for any ν>0
Proof
Note that the estimator \(\vartheta _{t,\varepsilon }^{\star } \) is defined for t∈[τ ε ,T], but as τ ε →0 we obtain for any positive t the relation t>τ ε . □
The substitution of the observations (9) provides us the equality
Recall that the vector-process (X s ,0≤s≤T) converges uniformly in s to the deterministic vector-function (x s ,0≤s≤T) and the estimator \({\bar {\vartheta }}_{\tau _{\varepsilon }} \) is consistent. Therefore, we have the convergence in probability
For the other terms, we first write the Taylor expansion
because \(\vartheta _{0}- {\bar {\vartheta }}_{\tau _{\varepsilon } }=O\left (\varepsilon ^{1-\frac {\delta }{2}}\right) \). Then, we denote
and write
The following estimate can be easily verified
because X s −x s =O(ε), \({\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}= O\left (\varepsilon ^{1-\frac {\delta }{2}}\right)\) and
Hence
The uniform consistency can be shown following the proof of such uniform consistency presented in Kutoyants (2015), Theorem 1.
Let us define the estimator-processes \(Y^{\star }_{\varepsilon } =\left (Y_{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T \right)\) and \(Z^{\star }_{\varepsilon } =\left (Z_{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T \right)\) as follows
Theorem 2
Suppose the conditions \({\mathfrak L},{\mathfrak R},{\mathfrak U}\) and (21) hold, then the esti-mator-processes \(Y_{\varepsilon }^{\star },Z_{\varepsilon }^{\star }\) admit the representations
where the Gaussian process
The random processes
for any τ∈(0,T] converge in probability to the processes
respectively, uniformly in t∈[τ,T]. Moreover, these approximations are asymptotically efficient in the sense of (17), (18).
Proof
By the condition \({\mathfrak U}\), we obtain the representation
and for any τ∈(0,T] we have the convergence in probability
□
Therefore, the representations (25),(26) follow now from (24).
More detailed analysis shows that the convergences O(1) in (24),(25) are uniform in t∈[τ,T] due to (11). Moreover, we have the convergence of moments uniform on compacts 𝜗 0∈K as well, because we have (12) and the moments of the preliminary estimator are bounded (22). Therefore, the estimates used above can be also written for the moments. This convergence of moments provides the asymptotic efficiency of the estimators \(Y^{\star }_{\varepsilon },Z^{\star }_{\varepsilon } \).
The estimators \(Y^{\star }_{t,\varepsilon },Z^{\star }_{t,\varepsilon },\tau _{\varepsilon } \leq t\leq T \) are given for the values t>τ ε =ε δ with δ∈(0,1). It is interesting to have a shorter learning interval and, therefore, longer estimation period for Y t ,Z t . That is why we propose the two-step MLE-process which uses the preliminary estimator with the worse rate of convergence. Let us take \(\delta \in [1,\frac {4}{3})\), introduce the second preliminary estimator-process
and the two-step MLE-process \(\vartheta _{t,\varepsilon }^{\star \star },\tau _{\varepsilon } \leq t\leq T \)
For the preliminary estimator we obtain the same estimate (22), but with different τ ε . Further, for the first preliminary estimator similar calculations as above provide us the estimates
For the two-step MLE-process we have
Therefore if we take γ such that γ+δ<2 and \(\gamma -\frac {\delta }{2}>0\), say, \(\gamma <\frac {2}{3}\), then we obtain
Now the estimator-processes \(Y^{\star \star }_{\varepsilon },Z^{\star \star }_{\varepsilon }\) defined with the help of two-step MLE-process
are known for the larger time interval [τ ε ,T].
Of course, we can continue this process and to reduce the learning interval once more by introducing the three-step MLE-process \(\vartheta _{t,\varepsilon }^{\star \star \star }\) as follows. The learning interval is [0,τ ε ], τ ε =ε δ, where \(\delta \in [\frac {4}{3}, \frac {3}{2})\). The first preliminary estimator-process is
the second is
and the three-step MLE-process
The similar calculations will provide us the relations : \({\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}=\varepsilon ^{1-\frac {\delta }{2}}O(1) \),
and
Hence if we chose δ, γ 1 and γ 2 such that
then once more we obtain asymptotically efficient MLE-process
Therefore, we obtain the corresponding approximations \(Y_{t,\varepsilon }^{\star \star \star },Z_{t,\varepsilon }^{\star \star \star }\) for the values t∈[τ ε ,T] with essentially smaller τ ε than in the case of one-step MLE-process.
Example
Black and Scholes model. Suppose that the forward equation is
and the functions f(x,y,z)=−β y−γ x z and Φ(x) are given. The function Φ(x) is continuous and satisfies the condition |Φ(x)|≤C(1+|x|p) with some constants C>0 and p>0. We have to find (Y t ,Z t ) such that
and Y T =Φ(X T ).
The corresponding PDE is
To write its solution we change the variables \(s=T-t, \bar x=\ln x\) and let \(u\left (t,\bar x,\vartheta \right)= e^{\mu (\vartheta) \bar x+\lambda (\vartheta) s}v\left (s,\bar x,\vartheta \right)\), where
Then, we obtain the reduced equation
whose solution is well-known
Let us fix \(\tau _{\varepsilon } =\varepsilon ^{\frac {3}{4}}\) and introduce the preliminary TFE is
Of course, we also can write the MLE \({\hat {\vartheta }}_{\tau _{\varepsilon } }\)
but as in our work we used the TFE, we show how to calculate \({\bar {\vartheta }}_{\tau _{\varepsilon } }\). The Fisher information is \(\mathbb {I}_{t}\left (\vartheta \right)=t\sigma ^{-2}\). The one-step MLE-process is
Moreover, it is easy to see that
Hence, the estimators \(\vartheta _{t,\varepsilon }^{\star } \) and \({\hat {\vartheta }}_{t,\varepsilon } \) have the same limit distributions. Therefore the estimator-process
It is easy to see that \(Y_{t,\varepsilon }^{\star }\longrightarrow \Phi \left (X_{T}\right) \) as t→T. The expression for \(Z_{t,\varepsilon }^{\star }\) can be written as well.
Discussions
Note that we approximate the solution of the BSDE and not the equation itself. Of course, it is also possible to write the stochastic differential for \(Y_{t,\varepsilon }^{\star }\). For simplicity of notation we consider the case k=1,d=1. Indeed, the process \(Y_{t,\varepsilon }^{\star }=u\left (t,X_{t},\vartheta _{t,\varepsilon }^{\star }\right)\), where X t has stochastic differential (9) and \(\vartheta _{t,\varepsilon }^{\star } \) given by (23) can be written as follows
with obvious notations. Therefore, the stochastic differential for \(Y_{t,\varepsilon }^{\star } \) can be written (Itô formula).
It was shown that the right-hand side of (29) tends to a constant 𝜗 0 as ε→0 and we can verify that \(\mathrm {d}\vartheta _{t,\varepsilon }^{\star }\rightarrow 0\).
More detailed analysis shows that
where the Gaussian process η t is defined in Theorem 2. We used the relation \(Y_{t,\varepsilon }^{\star }=Y_{t}+\varepsilon \eta _{t}+o(\varepsilon) \).
The multi-step MLE-processes used in this work can be useful in similar problems of BSDE approximations for dicrete-time observations and ergodic diffusion models mentioned in the introduction (see (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation) and Gasparyan and Kutoyants (2015)).
References
Bismut, JM: Conjugate convex functions in optimal stochastic control. J. Math. Anal. Appl. 44, 384–404 (1973).
El Karoui, N, Peng, S, Quenez, M: Backward stochastic differential equations in finance. Math. Fin. 7, 1–71 (1997).
Fisher, RA: Theory of statistical estimation. Proc. Cambridge Phylosophical Society. 22, 700–725 (1925).
Freidlin, MI, Wentzell, AD: Random Perturbations of Dynamical Systems. 2nd Ed. Springer, NY (1998).
Gasparyan, S, Kutoyants, YA: On approximation of the BSDE with unknown volatility in forward equation. Armenian J. Math. 7(1), 59–79 (2015).
Ibragimov, IA, Has’minskii, RZ: Statistical Estimation - Asymptotic Theory. Springer, New York (1981).
Jeganathan, P: Some asymptotic properties of risk functions when the limit of the experiment is mixed normal. Sankhya: The Indian Journal of Statistics. 45(Series A, Pt.1), 66–87 (1983).
Kamatani, K, Uchida, M: Hybrid multi-step estimators for stochastic differential equations based on sampled data. Statist. Inference Stoch. Processes. 18(2), 177–204 (2015).
Kutoyants, YA: Identification of Dynamical Systems with Small Noise. Kluwer Academic Publisher, Dordrecht (1994).
Kutoyants, YA: On approximation of the backward stochastic differential equation. Small noise, large samples and high frequency cases. Proceed. Steklov Inst. Mathematics. 287, 133–154 (2014).
Kutoyants, YA: On Multi-Step MLE-Process for Ergodic Diffusion. arXiv:1504.01869 [math.ST] (2015).
Kutoyants, YA, Motrunich, A: On milti-step MLE-process for Markov sequences. Metrika. 79(6), 705–724 (2016).
Kutoyants, YA, Zhou, L: On approximation of the backward stochastic differential equation. (arXiv:1305.3728). J. Stat. Plann. Infer. 150, 111–123 (2014).
Le Cam, L: On the asymptotic theory of estimation and testing hypotheses. In: Proc. 3rd Berkeley Symposium, vol. 1, pp. 129–156 (1956).
Lehmann, EL, Romano, JP: Testing Statistical Hypotheses. 3rd ed. Springer, NY (2005).
Liptser, R, Shiryaev, AN: Statistics of Random Processes. v.’s 1 and 2, 2-nd ed. Springer, NY (2001).
Pardoux, E, Peng, S: Adapted solution of a backward stochastic differential equation. System Control Letter. 14, 55–61 (1990).
Pardoux, E, Peng, S: Backward stochastic differential equations and quasilinear parabolic partial differential equations. Stochastic Partial Differential Equations and their Applications. Springer, Berlin (1992). (Lect. Notes Control Inf. Sci. 176).
Robinson, PM: The stochastic difference between econometric statistics. Econometrica. 56(3), 531–548 (1988).
Skorohod, AV, Khasminskii, RZ: On parameter estimation by indirect observations. Prob. Inform. Transm. 32, 58–68 (1996).
Uchida, M, Yoshida, N: Adaptive Bayes type estimators of ergodic diffusion processes from discrete observations. Statist. Inference Stoch. Processes. 17(2), 181–219 (2014).
Acknowledgments
This work was done with partial financial support of the RSF grant number 14-49-10079.
Competing interests
I declare that there is no competing interests.
Authors’ contributions
I read and approved the final manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kutoyants, Y.A. On approximation of BSDE and multi-step MLE-processes. Probab Uncertain Quant Risk 1, 4 (2016). https://doi.org/10.1186/s41546-016-0005-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41546-016-0005-0
Keywords
- Backward stochastic differential equation
- Parameter estimation
- Multi-step MLE-process
- Small noise
- Black and Scholes model