Open Access

# On approximation of BSDE and multi-step MLE-processes

Probability, Uncertainty and Quantitative Risk20161:4

DOI: 10.1186/s41546-016-0005-0

Accepted: 15 June 2016

Published: 16 August 2016

## Abstract

We consider the problem of approximation of the solution of the backward stochastic differential equations in Markovian case. We suppose that the forward equation depends on some unknown finite-dimensional parameter. This approximation is based on the solution of the partial differential equations and multi-step estimator-processes of the unknown parameter. As the model of observations of the forward equation we take a diffusion process with small volatility. First we establish a lower bound on the errors of all approximations and then we propose an approximation which is asymptotically efficient in the sense of this bound. The obtained results are illustrated on the example of the Black and Scholes model.

### Keywords

Backward stochastic differential equation Parameter estimation Multi-step MLE-process Small noise Black and Scholes model

62M05

## Introduction

We consider the problem of approximation of the solution of the backward stochastic differential equation (BSDE) in the so-called Markovian case. Let us recall some basics of BSDEs. We are given a stochastic differential equation (called forward)
$$\mathrm{d} X_{t}=S(t,X_{t})\;\mathrm{d} t+ \sigma (t,X_{t})\;\mathrm{d} W_{t},\ \ X_{0}=x_{0},\ 0\leq t\leq T,$$
where S(t,x) is the drift coefficient, σ(t,x)2 is the diffusion coefficient, and W t ,0≤tT is a standard Wiener process. In addition, we have two functions f(t,x,y,z) and Φ(x) and we must construct such couple of processes (Y t ,Z t ,0≤tT) that the solution of the equation
$$\mathrm{d} Y_{t}=-f(t,X_{t},Y_{t},Z_{t})\;\mathrm{d} t+Z_{t}\;\mathrm{d} W_{t},\ \;Y_{0}, \; 0\leq t\leq T,$$
(called backward) has the final value Y T =Φ(X T ). Such BSDEs were first introduced by Bismut in 1973 (Bismut 1973) in the linear case and the general theory was developed by Pardoux and Peng (1990) (Pardoux and Peng 1990). The Markovian case considered in this work was studied by Pardoux and Peng (Pardoux and Peng 1992), see Section 4 in El Karoui et al. (1997) as well. This model is also called forward-backward stochastic differential equation (FBSDE) (El Karoui et al. 1997).
The construction of the backward equation is realized as follows. Suppose that u(t,x) satisfies the parabolic partial differential equation
$$\frac{\partial u}{\partial t}+S\left(t,x\right)\frac{\partial u}{\partial x}+\frac{1}{2} \sigma \left(t,x\right)^{2}\frac{\partial^{2} u}{\partial x^{2}}=-f\left(t,x,u, a \left(t,x\right)\frac{\partial u}{\partial x}\right),$$
with the final condition u(T,x)=Φ(x). Let us let Y t =u(t,X t ), and Z t =σ(t,X t )u x′(t,X t ). Then, by Itô’s formula
$$\begin{array}{@{}rcl@{}} {\mathrm{d}}Y_{t}&&=\left[\frac{\partial u}{\partial t}\left(t,X_{t}\right)+S\left(t,X_{t}\right)\frac{\partial u}{\partial x}\left(t,X_{t}\right)+\frac{1}{2} \sigma \left(t,x\right)^{2}\frac{\partial^{2} u}{\partial x^{2}}\left(t,X_{t}\right) \right]\,{\mathrm{d}}t\\ &&\qquad \qquad + \sigma \left(t,X_{t}\right)\frac{\partial u}{\partial x}\left(t,X_{t}\right)\,{\mathrm{d}}W_{t}\\ &&=-f\left(t,X_{t},Y_{t},Z_{t}\right)\,{\mathrm{d}}t+Z_{t}\,{\mathrm{d}}W_{t},\qquad Y_{0}=u\left(0,X_{0}\right),\quad 0\leq t\leq T. \end{array}$$

The final value Y T =u(T,X T )=Φ(X T ). Therefore, if we have the solution u(t,x), then we immediately obtain the BSDE.

We are interested by the problem of approximation of (Y t ,Z t ,0≤tT) in the situation, where the forward equation contains some unknown finite-dimensional parameter 𝜗:
$$\mathrm{d} X_{t}=S(\vartheta,t,X_{t})\;\mathrm{d} t+ \sigma (\vartheta,t,X_{t})\;\mathrm{d} W_{t},\ \ X_{0}=x_{0},\ 0\leq t\leq T.$$
Then the solution of the PDE u=u(t,x,𝜗). We cannot simply let Y t =u(t,X t ,𝜗) because we do not know 𝜗. Of course, the natural way to approximate Y t and Z t is to estimate first the unknown parameter 𝜗 with the help of some estimator $$\bar {\vartheta }$$ and then to put, say, $${\bar {Y}}_{t}=u(t,X_{t},{\bar {\vartheta }})$$. We can guess that if $$\bar {\vartheta }$$ is a good estimator of 𝜗, then $${\bar {Y}}_{t}$$ will be a good estimator of Y t . There are several problems, that are interesting to study in this framework. We must understand what the conditions imposed on the estimator $$\bar {\vartheta }$$ that allow us to say that it is good. We consider that a good estimator has the following properties.
1. 1.

To estimate Y t we need an estimator, which is constructed by the first observations of the solution of the forward equation up to time t, i.e., $${\bar {\vartheta }}_{t} = {\bar {\vartheta }}_{t} \left (X_{s},0\leq s\leq t\right)$$, 0<tT.

2. 2.

As we need such estimator for all t(0,T]we suppose that its calculation must be relatively simple.

3. 3.

The error of estimation, say, $${\mathbf {E}}_{\vartheta _{0}}\left ({\bar {\vartheta }}_{t}-\vartheta _{0}\right)^{2}$$ must be as minimal as possible.

Therefore $$\bar {\vartheta }$$ is an estimator-process $${\bar {\vartheta }} =\left ({\bar {\vartheta }}_{t},0<t\leq T\right)$$. Of course, the construction of such estimator-process is an intermediate problem. The main problem is to obtain a good approximations of Y t and Z t . In particular, we must show that the approximations
$${\bar{Y}}_{t}=u(t,X_{t},{\bar{\vartheta}}_{t}),\qquad \quad {\bar{Z}}_{t}=u'_{x}(t,X_{t},{\bar{\vartheta}}_{t})\,\sigma ({\bar{\vartheta}}_{t},t,X_{t})$$
are in some sense asymptotically optimal, i.e., it is impossible to have approximations of these processes with asymptotic errors smaller than that of $${\bar {Y}}_{t}$$ and $${\bar {Z}}_{t}$$.
The goal of the study initiated in Kutoyants and Zhou (2014) is to realize such a program for three models of observations of the forward equation. As is usual in statistics, we consider situations where it is possible to have a consistent estimation of the unknown parameters and processes. Therefore, we are interested by the following well-known models of observations.
• Diffusion process with an unknown parameter in the drift coefficient and small noise or small volatility
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,t,X_{t}\right)\mathrm{d}t+\varepsilon \sigma \left(t,X_{t}\right)\,\mathrm{d}W_{t}, \quad x_{0}, \; 0\leq t\leq T. \end{array}$$
(1)

Here the time T of observations X T =(X t ,0≤tT) is fixed and the limit corresponds to ε→0.

• Diffusion process
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(t,X_{t}\right)\mathrm{d}t+ \sigma \left(\vartheta,t,X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T, \end{array}$$
(2)

observed in the discrete times $$X^{n}=\left (X_{t_{0}},X_{t_{1}},\ldots X_{t_{n}}\right)$$, $$t_{i}=i\frac {T}{n}$$. Here the unknown parameter is in the volatility coefficient and the limit corresponds to n (high frequency model of observations). The time T of observations is fixed.

• Ergodic diffusion process
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,X_{t}\right)\mathrm{d}t+\sigma \left(X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T. \end{array}$$
(3)

Here the unknown parameter 𝜗 is in the drift coefficient, we have continuous time observations X T =(X t ,0≤tT) and the limit is T.

Of course there are other possible statements. For example, it can be considered the mixture of discrete time and ergodic diffusion. This corresponds to the equation
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,X_{t}\right)\mathrm{d}t+\sigma \left(\vartheta,X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T_{n} \end{array}$$

and observations $$X^{n}=\left (X_{t_{0}},X_{t_{1}},\ldots X_{t_{n}}\right)$$. Here maxi|t i t i−1|→0 and T n . Such a model of parameter estimation was studied, e.g., in Kamatani and Uchida (2015); Uchida and Yoshida (2014). It is possible to consider the mixture of discrete-time and small noise models, to consider the model with X t →± or the models with null recurrent forward equation etc. It will be interesting to see the statements of the statistical problems in non-Markovian cases for more general models.

Let us decribe the general framework of the statistical study of the above mentioned three models (1)-(3). For each model we propose an estimator-processes $$\vartheta _{t}^{\star },0< t\leq T$$ such that $$Y^{\star }_{t}=u(t,X_{t},\vartheta _{t}^{*})\rightarrow Y_{t}$$ and the error of approximation $${\mathbf {E}}_{\vartheta } \left (Y^{\star }_{t}-Y_{t} \right)^{2}$$ is asymptotically minimal. In the earlier works Kutoyants and Zhou (2014); Gasparyan and Kutoyants (2015); (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation) (see the review of these works in Kutoyants (2014)) we considered the approximation of the solution of BSDEs with a learning interval of fixed length.

The optimality of estimators of Y t and Z t is understoud as follows. We define for each model a normalization function φ→0, i.e., φ ε →0 as ε→0, φ n →0 as n, and φ T →0 as T.

We propose the lower bounds on the risks of all estimators
$$\begin{array}{@{}rcl@{}} \lim\limits_{\overline{\delta \rightarrow 0}}\lim\limits_{\overline{\varepsilon,n,T}}\sup\limits_{\left|\vartheta -\vartheta_{0}\right|<\delta }{\mathbf{E}}_{\vartheta} \left|\frac{{\bar{Y}}_{t}-Y_{t}}{\varphi }\right|^{2}\geq D\left(\vartheta_{0}\right)^{2}, \end{array}$$
which allow us to define the asymptotically efficient estimators $$Y_{t}^{\star }$$ of Y t as follows
$$\begin{array}{@{}rcl@{}} \lim\limits_{\delta \rightarrow 0}\lim\limits_{\varepsilon,n,T}\sup_{\left|\vartheta -\vartheta_{0}\right|<\delta }{\mathbf{E}}_{\vartheta} \left|\frac{Y_{t}^{\star}-Y_{t}}{\varphi }\right|^{2}= D\left(\vartheta_{0}\right)^{2}. \end{array}$$

We suppose that the last equality takes place for all 𝜗 0Θ and all t(0,T]. We also have a similar bound in the problem of estimation of Z t . For models (1) and (3) these bounds are slight modifications of the Hajek-Le Cam lower bound (Ibragimov and Has’minskii 1981) and for model (2) the lower bound is similar to Jeganathan’s lower bound (Jeganathan 1983).

We take the quadratic loss function just for simplicity of exposition. For all mentioned models, the similar lower bounds and corresponding estimator processes can be proved for more general loss functions.

The approximation of the solution of BSDEs in the Markovian case were initiated in the work Kutoyants and Zhou (2014), where the model of small volatility was considered. The parameter 𝜗 was supposed one-dimensional and the approximation-process $$Y_{t,\varepsilon }^{\star }$$ was defined for t[τ,T], where τ>0 is a fixed value.

In the work Gasparyan and Kutoyants (2015), we considered the model of discrete-time observations (2) and the one-step MLE-process which allowed us to construct an estimator-process $$Y_{t_{k},n}^{\star }$$ for the values t k,n [τ,T], where τ>0 is fixed.

The case of ergodic diffusion process is considered in the work (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation), which is still in progress.

The main contribution of the present work is due to a new class of estimator-processes called multi-step MLE-process introduced in Kutoyants (2015). These estimator-processes allow us to construct the approximations of the solutions of BSDEs for three above mentioned models with vanishing learning intervals (models (1) and (2)) or negligible with respect to the whole volume of observations learning interval (model (3)). Here we consider the model (1) only. The models (2) and (3) we leave to study later.

In the present work, we consider the small volatility model where we suppose that the unknown parameter is multi-dimensional and the approximation process $$Y_{t,\varepsilon }^{\star }$$ we define for t[τ ε ,T], where τ ε →0. This approximation allows us to consider the case τ ε =ε δ →0 and, moreover, to choose δ close to 2. The relations between the choice of δ and the multi-step MLE-processes are the following. If δ(0,1), then we use the one-step MLE-process $$\vartheta _{t,\varepsilon }^{\star }$$; if $$\delta \in [1,\frac {4}{3})$$, then we use the two-step MLE-process $$\vartheta _{t,\varepsilon }^{\star \star }$$; if $$\delta \in [\frac {4}{3}, \frac {3}{2})$$, then we use the three-step MLE-process $$\vartheta _{t,\varepsilon }^{\star \star \star }$$.

In the work Kutoyants (2015) we aleady studied the multi-step MLE-process for ergodic diffusion process, and the structure of estimator-process proposed in the present work is quite similar.

Note that the multi-step, like the well-known one-step ML-estimators, are based on the so-called Fisher-score device proposed by Fisher in 1925 (Fisher 1925) and studied by Le Cam in 1956 (Le Cam 1956). Let us recall this construction. Suppose that we have n i.i.d. r.v.’s X n =(X 1,…,X n ) with smooth density function f(𝜗,x) and denote (𝜗,x)= lnf(𝜗,x). The maximum likelihood equation is
$$\begin{array}{@{}rcl@{}} \sum\limits_{j=1}^{n}\dot \ell\left({\hat{\vartheta}}_{n},X_{j}\right)=0. \end{array}$$
Here and in the rest of the paper dot means derivation w.r.t. 𝜗. If we expand it at the vicinity of the true value 𝜗 0, we obtain
$$\begin{array}{@{}rcl@{}} \sum\limits_{j=1}^{n}\dot \ell\left(\vartheta_{0},X_{j}\right)+\left({\hat{\vartheta}}_{n}-\vartheta_{0} \right)\sum\limits_{j=1}^{n}\ddot \ell\left({\tilde{\vartheta}}_{n},X_{j}\right)=0. \end{array}$$
Therefore
$$\begin{array}{@{}rcl@{}} {\hat{\vartheta}}_{n}=\vartheta_{0}-\frac{{\sum\nolimits}_{j=1}^{n}\dot \ell\left(\vartheta_{0},X_{j}\right) }{{\sum\nolimits}_{j=1}^{n}\ddot \ell\left({\tilde{\vartheta}}_{n},X_{j}\right)}=\vartheta_{0}+\frac{1}{\sqrt{n}}\frac{\frac{1}{\sqrt{n}}{\sum\nolimits}_{j=1}^{n}\dot \ell\left(\vartheta_{0},X_{j}\right) }{-\frac{1}{n}{\sum\nolimits}_{j=1}^{n}\ddot \ell\left({\tilde{\vartheta}}_{n},X_{j}\right)}. \end{array}$$
(4)
Note that
$$\begin{array}{@{}rcl@{}} -\frac{1}{n}\sum_{j=1}^{n}\ddot \ell\left(\vartheta_{0},X_{j}\right)\longrightarrow {\mathbb{I}}\left(\vartheta_{0}\right)=\int_{}^{} \dot \ell\left(\vartheta_{0},x\right)^{2}f\left(\vartheta_{0},x\right)\mathrm{d}x, \end{array}$$
(5)

where $${\mathbb {I}}\left (\vartheta _{0}\right)$$ is the Fisher information.

Suppose that we have a preliminary estimator $${\bar {\vartheta }}_{n}$$ such that
$$\sqrt{n}\left({\bar{\vartheta}}_{n}-\vartheta_{0} \right)\Longrightarrow {\mathcal{N}}\left(0, D\left(\vartheta_{0}\right)\right),\qquad D\left(\vartheta_{0}\right)>{\mathbb{I}}\left(\vartheta_{0}\right)^{-1 }.$$
Keeping in mind the relations (4)-(5), the one-step MLE $$\vartheta _{n}^{\star }$$ is defined as follows
$$\begin{array}{@{}rcl@{}} \vartheta_{n}^{\star}={\bar{\vartheta}}_{n}+\frac{1}{\sqrt{n}}\frac{\Delta_{n}\left({\bar{\vartheta}}_{n},X^{n}\right) }{{\mathbb{I}}\left({\bar{\vartheta}}_{n}\right)},\qquad \quad \Delta_{n}\left(\vartheta,X^{n}\right)=\frac{1}{\sqrt{n} }\sum_{j=1}^{n}\dot \ell\left(\vartheta,X_{j}\right). \end{array}$$
This estimator is already asymptotically efficient because its limit variance is I(𝜗 0)−1:
$$\begin{array}{@{}rcl@{}} \sqrt{n}\left(\vartheta_{n}^{\star}-\vartheta_{0} \right)\Longrightarrow {\mathcal{N}}\left(0, {\mathbb{I}}\left(\vartheta_{0}\right)^{-1 }\right). \end{array}$$

Therefore, this Fisher-score device allows us to improve the preliminary estimator up to asymptotically efficient (see details, e.g., in Lehmann and Romano (2005)).

Moreover, this device can be applied even in the case of preliminary estimator with the rate of convergence worse than $$\sqrt {n}$$ (see, e.g., Robinson (1988); Kamatani and Uchida (2015)). For continuous-time stochastic processes such a construction was used, for example, in Skorohod and Khasminskii (1996).

The one-step MLE-process, introduced in Kutoyants (2015), for this model of observations can be written as follows. Let us denote $${\bar {\vartheta }}_{n}$$ the premilinary estimator constructed by the first N=[n δ ] observations X N =(X 1,…,X N ) with $$\delta \in (\frac {1}{2},1)$$. Then the one-step MLE-process $$\vartheta _{n}^{\star }=\left (\vartheta _{k,n}^{\star }, N+1\leq k\leq n\right)$$ is defined by the equality
$$\begin{array}{@{}rcl@{}} \vartheta_{k,n}^{\star}={\bar{\vartheta}}_{N}+\mathbb{I}\left({\bar{\vartheta}}_{N}\right)^{-1 }\frac{1}{k}\sum_{j=N+1}^{k} \dot\ell\left({\bar{\vartheta}}_{N},X_{j} \right) \end{array}$$
(6)
and for k=[s n],s(0,1] we have the convergence
$$\begin{array}{@{}rcl@{}} \sqrt{k}\left(\vartheta_{k,n}^{\star}-\vartheta_{0}\right)\Longrightarrow {\mathcal{N}}\left(0, {\mathbb{I}}\left(\vartheta_{0}\right)^{-1 }\right). \end{array}$$

Here s is fixed and n. Therefore $$\vartheta _{n}^{\star }$$ is a good estimator, i.e., $$\vartheta _{k,n}^{\star }$$ depends on X k =(X 1,…,X k ), is easy to calculate and is asymptotically efficient because it is asymptotically equivalent to the MLE. For the details see Kutoyants and Motrunich (2016).

The one-step MLE-process in the case of ergodic diffusion forward Eq. 3 can be illustrated as follows. Suppose that we have a preliminary estimator $${\bar {\vartheta }}_{T^{\delta }}$$ constructed by the observations $$X^{T^{\delta } }=\left (X_{t}, 0 \leq t\leq T^{\delta }\right)$$ with $$\delta \in (\frac {1}{2},1]$$. Then the one-step MLE-process $$\vartheta ^{\star }_{t,T}, T^{\delta } <t\leq T$$ based on the Fisher-score device (4), (6) has the following form
$$\begin{array}{@{}rcl@{}} \vartheta^{\star}_{t,T}={\bar{\vartheta}}_{T^{\delta}}+{{\mathbb{I}}\left({\bar{\vartheta}}_{T^{\delta}}\right)^{-1} }\int_{T^{\delta} }^{t}\frac{\dot S\left({\bar{\vartheta}}_{T^{\delta}},X_{s}\right)}{t\;\sigma \left(X_{s}\right)^{2}}\left[ \mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{T^{\delta}},X_{s}\right)\mathrm{d}s\right]. \end{array}$$
(7)
This estimator-process is asymptotically efficient (t=r T;r(0,1])
$$\begin{array}{@{}rcl@{}} \sqrt{t}\left(\vartheta^{\star}_{t,T}-\vartheta_{0} \right)\Longrightarrow {\mathcal{N}}\left(0,{\mathbb{I}}\left(\vartheta_{0}\right)^{-1 }\right) \end{array}$$
(see Kutoyants (2015)) and provides asymptotically efficient estimator-processes
$$\begin{array}{@{}rcl@{}} Y_{t,T}^{\star}=u\left(t,X_{t},\vartheta^{\star}_{t,T}\right),\qquad \quad Z_{t,T}^{\star}=u'_{x}\left(t,X_{t},\vartheta^{\star}_{t,T}\right) \sigma \left(X_{t}\right) \end{array}$$
(8)

of the solution (Y t ,Z t ) of the BSDE.

## Forward equation with small volatility

We are given the function f(t,x,y,z) defined on $$\left [0,T\right ]\times {\mathcal {R}}^{k}\times {\mathcal {R}}\times {\mathcal {R}}^{k}$$, function $$\Phi (x), x\in {\mathcal {R}}^{k}$$ and k-dimensional diffusion process (forward)
$$\begin{array}{@{}rcl@{}} \mathrm{d}X_{t}=S\left(\vartheta,t,X_{t}\right)\mathrm{d}t+\varepsilon \sigma \left(t,X_{t}\right)\,\mathrm{d}W_{t}, \quad X_{0}, \; 0\leq t\leq T. \end{array}$$
(9)

Here $$\vartheta \in \Theta \subset {\mathcal {R}}^{d}$$, Θ is an open bounded set and $$W_{t}=\left ({W^{1}_{t}},\ldots,{W^{k}_{t}}\right),0\leq t\leq T$$ is a standard k-dimensional Wiener process.

Introduce the condition $${\mathfrak L}$$.

The functions f(t,x,y,z), Φ(x), vector S(𝜗,t,x)=(S l (𝜗,t,x), l=1,…,k)and k×k matrix σ(t,x)=(σ lm (t,x))are smooth
$$\begin{array}{@{}rcl@{}} &&\left|S\left(\vartheta,t,x\right)-S\left(\vartheta,t,y\right) \right|+\left|\sigma \left(t,x\right)-\sigma \left(t,y\right) \right|\leq L\left|x-y\right|,\\ &&\left|f\left(t,x,y_{1},z_{1}\right)-f\left(t,x,y_{2},z_{2}\right) \right|\leq C \left[\left|y_{1}-y_{2}\right|+ \left|z_{1}-z_{2}\right|\right] \end{array}$$
and satisfy (p>0)
$$\begin{array}{@{}rcl@{}} &&\left| S \left(\vartheta,t,x\right)\right|+\left| \sigma \left(t,x\right)\right|\leq C\left(1+\left|x\right|\right),\\ &&\left|f\left(t,x,y,z\right)\right|+\left|\Phi (x)\right|\leq C\left(1+ \left|x\right|^{p}\right). \end{array}$$
We must find a couple of stochastic processes $$\left (X_{t,\varepsilon }^{\star }, Z_{t,\varepsilon }^{\star },0\leq t\leq T\right)$$ which approximate well the solution of the BSDE
$$\mathrm{d}Y_{t}=-f\left(t,X_{t},Y_{t},Z_{t}\right)\,\mathrm{d}t+Z_{t}\,\mathrm{d}W_{t},\qquad Y_{0},\quad 0\leq t\leq T$$
(10)

satisfying the condition Y T =Φ(X T ).

Let us denote $$x_{t}\left (\vartheta \right)=\left (x_{t}^{(1)}(\vartheta),\ldots, x_{t}^{(k)}(\vartheta)\right),0\leq t\leq T$$ the solution of the system of ordinary differential equations
$$\begin{array}{@{}rcl@{}} \frac{\mathrm{d}x_{t}(\vartheta) }{\mathrm{d}t}=S\left(\vartheta,t,x_{t}(\vartheta)\right),\qquad x_{0},\qquad 0\leq t\leq T. \end{array}$$
The true value is 𝜗 0 and we let x t =x t (𝜗 0). We have the estimates: with probability 1
$$\begin{array}{@{}rcl@{}} \sup\limits_{0\leq t\leq T}\left|X_{t}-x_{t}\right|\leq C\varepsilon \sup\limits_{0\leq t\leq T} \left|W_{t}\right| \end{array}$$
(11)
and for any p>0
$$\begin{array}{@{}rcl@{}} &\sup\limits_{0\leq t\leq T}{\mathbf{E}}_{\vartheta_{0}}\left|X_{t}-x_{t}\right|^{p}\leq C\varepsilon^{p}. \end{array}$$
(12)

For the proof see, e.g., Kutoyants (1994).

We have a family of problems of parameter estimation by observations X t =(X s ,0≤st), where t(0,T] and therefore we need a family of estimators $${\bar {\vartheta }}_{t,\varepsilon }, 0<t\leq T$$. Let $$\left ({\mathcal {C}}^{k}\left (\left [0,t\right ]\right),{\mathfrak B}_{t}\right)$$ be a measurable space of continuous vector-functions on [0,t] with Borelian σ-algebra $${\mathfrak B}_{t}$$. Denote by $$\left \{\mathbf {P}_{\vartheta }^{\left (\varepsilon,t \right)},\vartheta \in \Theta \right \}$$ the family of measures induced in this space by the solutions of (9) with different 𝜗Θ. Note that these measures are equivalent (see Liptser and Shiryaev (2001)) and the likelihood ratio function is
$$\begin{array}{@{}rcl@{}} L\left(\vartheta,X^{t}\right)&=&\frac{\mathrm{d} \mathbf{P}_{\vartheta }^{\left(\varepsilon,t \right)}}{\mathrm{d} \mathbf{P}_{0 }^{\left(\varepsilon,t \right)}}\left(X^{t}\right)=\exp\left\{\frac{1}{\varepsilon^{2}}{{\int\nolimits}_{0}^{t}} S\left(\vartheta,s,X_{s}\right) \mathbb{A} \left(s,X_{s}\right)^{-1}\mathrm{d}X_{t}\right.\\ && \left. -\frac{1}{2\varepsilon^{2}}{{\int\nolimits}_{0}^{t}} S\left(\vartheta,s,X_{s}\right) \mathbb{A} \left(s,X_{s}\right)^{-1}S\left(\vartheta,s,X_{s}\right)\mathrm{d}s\right\},\quad \vartheta \in \Theta. \end{array}$$
Here $$\mathbf {P}_{0 }^{\left (\varepsilon,t \right)}$$ is the measure, which corresponds to the observations (9) with S(𝜗,t,X t )≡0. The matrix $$\mathbb {A}\left (s,x\right)$$ is
$$\begin{array}{@{}rcl@{}} \mathbb{A}_{lm}\left(s,x\right)=\left[\sigma \left(s,x\right)^{*}\sigma \left(s,x\right)\right]_{lm},\qquad l,m=1,\ldots,k. \end{array}$$
Recall that the MLE $${\hat {\vartheta }}_{\varepsilon,t}$$ is defined by the equation
$$\begin{array}{@{}rcl@{}} L\left({\hat{\vartheta}}_{\varepsilon,t},X^{t}\right)=\sup\limits_{\vartheta \in \Theta }L\left(\vartheta,X^{t}\right). \end{array}$$
(13)
Introduce the Regularity conditions $${\mathfrak R}$$.
1. 1.

The function S(𝜗,t,x)is two-times continuously differentiable w.r.t. 𝜗 and the derivatives are Lipschitz in x.

2. 2.
We suppose that there exists a positive constant m such that for any real $$\lambda \in {\mathcal {R}}^{k}$$ we have
$$\begin{array}{@{}rcl@{}} m^{-1}\left\|\lambda \right\|^{2}\leq \lambda^{*} \mathbb{A}\left(s,x\right)\lambda \leq m\left\|\lambda \right\|^{2}. \end{array}$$
(14)

3. 3.
The Fisher information matrix
$$\begin{array}{@{}rcl@{}} \mathbb{I}_{t}(\vartheta)={{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta,s,x_{s}(\vartheta)\right)^{*}\mathbb{A}\left(s,x_{s}(\vartheta)\right)^{-1}\dot{\mathbb{S}}\left(\vartheta,s,x_{s}(\vartheta)\right)\,\mathrm{d}s, \end{array}$$
is uniformly nondegenerate:
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta \in\Theta }\inf\limits_{\left|\lambda \right|=1} \lambda^{*}\mathbb{I}_{t}(\vartheta)\lambda >0 \end{array}$$

Here $$\lambda \in {\mathcal {R}}^{d}$$ dot means derivation w.r.t. 𝜗 and $$\dot {\mathbb {S}}\left (\vartheta,s,x\right)$$ is k×d-matrix.

4. 4.
Identifiability condition : for any ν>0and any t(0,T]the estimate
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta_{0}\in\Theta }\inf\limits_{\left|\vartheta -\vartheta_{0}\right|>\nu }{{\int\nolimits}_{0}^{t}}\delta \left(s,x_{s},\vartheta,\vartheta_{0}\right)^{*}\mathbb{A}\left(s,x_{s}\right)^{-1}\delta \left(s,x_{s},\vartheta,\vartheta_{0}\right)\mathrm{d}s>0 \end{array}$$

holds. Here δ(s,x s ,𝜗,𝜗 0)=S(𝜗,s,x s )−S(𝜗 0,s,x s ).

The Regularity conditions allow us to prove the folowing properties of the MLE $${\hat {\vartheta }}_{\varepsilon,t}, t\in (0,T]$$.
1. 1.
It is uniformly consistent: for any ν>0and any compact KΘ
$$\begin{array}{@{}rcl@{}} \lim\limits_{\varepsilon \rightarrow 0}\sup\limits_{\vartheta_{0}\in \textbf{K}} \mathbf{P}_{\vartheta_{0} }^{\left(\varepsilon,t \right)}\left(\left|{\hat{\vartheta}}_{\varepsilon,t}-\vartheta_{0} \right|>\nu \right)=0. \end{array}$$

2. 2.
Uniformly on compacts KΘ asymptotically normal
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left({\hat{\vartheta}}_{\varepsilon,t}-\vartheta_{0} \right)\Longrightarrow {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta_{0} \right)^{-1}\right). \end{array}$$

3. 3.

The polynomial moments converge and it is asymptotically efficient.

These properties were established in Kutoyants (1994) in the case of the one-dimensional diffusion processes (9). There is no essential dificulties to apply the same proof in our case. The presented below multi-step MLE-processes have exactly the same asymptotic properties, but can be calculated more easily.

Introduce the family of functions
$$\begin{array}{@{}rcl@{}} {\mathcal{U}}=\left\{\left(u(t,x,\vartheta), t\in \left[0,T\right], x\in {\mathcal{R}}^{k}\right), \vartheta \in \Theta \right\} \end{array}$$
such that for all 𝜗Θ the function u(t,x,𝜗) satisfies the PDE
$$\begin{array}{@{}rcl@{}} &&\frac{\partial u}{\partial t}+\sum\limits_{l=1}^{k}S_{l}(\vartheta,t,x)\frac{\partial u}{\partial x_{l}} +\frac{\varepsilon^{2}}{2}\sum\limits_{l,m=1}^{k}\mathbb{A}_{l,m}\left(t,x\right) \frac{\partial^{2} u}{\partial x_{l}\partial x_{m}}\\ &&\qquad \qquad \qquad =-f\left(t,x,u,\varepsilon\sum\limits_{l=1}^{k} \sigma_{lm} (t,x)\frac{\partial u}{\partial x_{m}}\right) \end{array}$$

and the condition $$u(T,x,\vartheta)=\Phi (x), x\in {\mathcal {R}}^{k}$$.

The limit of the function u(t,x,𝜗) as ε→0 we denote as u (t,x,𝜗). The function u (t,x,𝜗) satisfies the equation
$$\begin{array}{@{}rcl@{}} &\frac{\partial u_{\circ}}{\partial t}+\sum\limits_{l=1}^{k}S_{l}(\vartheta,t,x)\frac{\partial u_{\circ}}{\partial x_{l}} =-f\left(t,x,u_{\circ},\varepsilon\sum\limits_{l=1}^{k} \sigma_{lm} (t,x)\frac{\partial u_{\circ}}{\partial x_{m}}\right) \end{array}$$

with the final value u (T,x,𝜗)=Φ(x). Below $$\dot u_{\circ }\left (t,x,\vartheta \right)$$ and $$\ddot u_{\circ }\left (t,x,\vartheta \right)$$ means the derivative of this function w.r.t. 𝜗.

Introduce the condition $${\mathfrak U}$$
1. 1.

The function u(t,x,𝜗)is two-times continuously differentiable w.r.t. 𝜗 and the derivatives $$\dot u\left (t,x,\vartheta \right)$$ and $$\ddot u\left (t,x,\vartheta \right)$$ are Lipschitz w.r.t. x unifoormly in 𝜗Θ.

2. 2.

The function u(t,x,𝜗)and its derivatives $$\dot u\left (t,x,\vartheta \right)$$ and $$\dot u'_{x}\left (t,x,\vartheta \right)$$ converge uniformly in t[0,T]to $$u_{\circ }\left (t,x,\vartheta \right),\dot u_{\circ }\left (t,x,\vartheta \right),\dot u_{\circ,x}'\left (t,x,\vartheta \right)$$ respectively.

The sufficient conditions providing these properties of u(t,x,𝜗) can be found in Freidlin and Wentzell (1998), Theorem 2.3.1. Note that the derivatives $$\dot u\left (t,x,\vartheta \right)$$ and $$\ddot u\left (t,x,\vartheta \right)$$ satisfy the linear PDE of the same type.

If we let Y t =u(t,X t ,𝜗), then by Itô’s formula we obtain BSDE (10) with
$$Z_{t}=\left({Z_{t}^{1}},\ldots, {Z_{t}^{k}}\right),\qquad {Z_{t}^{m}}=\varepsilon \sum\limits_{l=1}^{k}\sigma_{ml} \left(t,X_{t}\right) u'_{x_{l}} \left(t,X_{t},\vartheta \right).$$

Recall that our goal is to construct an asymptotically efficient approximation of the couple (Y t ,Z t ). To compare all possible estimators we introduce the lower bounds on the mean-square risks. This is a version of the well-known Hajek-Le Cam minimax risk bound (see, e.g., Ibragimov and Has’minskii (1981), Theorem 2.12.1).

### Theorem 1

Suppose that the conditions $${\mathfrak L}, {\mathfrak R}$$ and $${\mathfrak U}$$ are fulfilled. Then for all estimators $${\bar {Y}}_{t}$$ and $${\bar {Z}}_{t}$$ and all t(0,T] we have the relations
$$\begin{array}{@{}rcl@{}} && {\lim}_{\overline{\nu \rightarrow 0}}{\lim}_{\overline{\varepsilon \rightarrow 0}} \sup\limits_{\left|\vartheta -\vartheta_{0}\right|\leq \nu} \varepsilon^{-2}{\mathbf{E}}_{\vartheta} \left| {\bar{Y}}_{t}-Y_{t}\right|^{2}\geq \dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right)^{*}\mathbb{I}_{t}\left(\vartheta_{0}\right)^{-1}\dot u_{0}\left(t,x_{t},\vartheta_{0}\right), \end{array}$$
(15)
$$\begin{array}{@{}rcl@{}} &&{\lim}_{\overline{\nu \rightarrow 0}}{\lim}_{\overline{\varepsilon \rightarrow 0}} \sup_{\left|\vartheta -\vartheta _{0}\right|\leq \nu} \varepsilon^{-4}{\mathbf{E}}_{\vartheta} \left| {\bar{Z}}_{t}-Z_{t}\right|^{2} \geq \left|{\left(\dot u_{\circ}\right)'_{x}\left(t,x_{t},\vartheta _{0}\right)^{*}{\mathbb{I}_{t}}\left(\vartheta_{0}\right)^{-\frac{1}{2}} \sigma \left(t,x_{t}\right)}\right|^{2}. \end{array}$$
(16)

### Proof

We first verify that the family of measures is locally asymptotically normal (LAN) and then we apply the proof of the Hajek-Le Cam lower bound (Ibragimov and Has’minskii 1981), which provides us (15), (16). We present here the necessary modification of the proof given in Ibragimov and Has’minskii (1981). Usually this inequality is considered for the risk like $${\mathbf {E}}_{\vartheta } \left |{\bar {\vartheta }}_{\varepsilon } -\vartheta \right |^{2}$$ and we are interested by the risk $${\mathbf {E}}_{\vartheta } \left |{\bar {Y}}_{t,\varepsilon } -Y_{t} \right |^{2}$$, where Y t is a random process. Another point, the random vector Δ t (see below), in general, is asymptotically normal and, in our case, it has Gaussian distribution, that is why the proof is slightly simplified. □

Let us denote $$\varphi _{\varepsilon } =\varepsilon {\mathbb {I}_{t}}^{-\frac {1}{2}}$$ where $${\mathbb {I}_{t}}={\mathbb {I}_{t}}\left (\vartheta _{0}\right)$$ and introduce the normalized likelihood ratio
$$\begin{array}{@{}rcl@{}} Z_{t,\varepsilon }(v)=\frac{L\left(\vartheta_{0}+\varphi_{\varepsilon} v,X^{t}\right)}{L\left(\vartheta_{0},X^{t}\right)},\qquad v\in V_{\varepsilon} =\left\{v: \vartheta_{0}+\varphi_{\varepsilon} v\in \Theta \right\}. \end{array}$$
We can write
$$\begin{array}{@{}rcl@{}} \ln Z_{t,\varepsilon }(v)&=&\frac{1}{\varepsilon }{{\int\nolimits}_{0}^{t}}\left[S\left(\vartheta_{0}+\varphi_{\varepsilon} v,s,X_{s}\right)-S\left(\vartheta_{0},s,X_{s}\right)\right]\sigma \left(s,X_{s}\right)^{-1}\mathrm{d}W_{s}\\ &&-\frac{1}{2\varepsilon^{2} }{{\int\nolimits}_{0}^{t}}\left|\left[S\left(\vartheta_{0}+\varphi_{\varepsilon} v,s,X_{s}\right)-S\left(\vartheta_{0},s,X_{s}\right)\right]^{*}\sigma \left(s,X_{s}\right)^{-1}\right|^{2}\mathrm{d}s\\ &=&v^{*}\Delta_{t }-\frac{1}{2}\left|v\right|^{2} +r_{\varepsilon}, \end{array}$$
where r ε →0 and the vector
$$\begin{array}{@{}rcl@{}} \Delta_{t}={\mathbb{I}_{t}}\left(\vartheta_{0}\right)^{-\frac{1}{2}} {{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right) \sigma \left(s,x_{s}\right)^{-1}\mathrm{d}W_{s} \sim {\mathcal{N}} \left(0, \mathbb{J} \right). \end{array}$$

Here $$\mathbb {J}$$ is a unit d×d matrix.

Hence, the family of measures $$\left \{\mathbf {P}_{\vartheta }^{\left (\varepsilon,t \right)},\vartheta \in \Theta \right \}$$ is LAN in Θ (Ibragimov and Has’minskii 1981; Kutoyants 1994).

Below, M>0, $${\mathcal {K}}_{M}$$ is a cube in $${\mathcal {R}}^{d}$$ whose vertices have coordinates ±M, so that its volume is (2M) d and 𝜗 v =𝜗 0+φ ε v.
$$\begin{array}{@{}rcl@{}} &&\sup\limits_{\left|\vartheta -\vartheta_{0}\right|\leq \nu} {\mathbf{E}}_{\vartheta} \left| {\bar{Y}}_{t,\varepsilon }-Y_{t}\right|^{2}=\sup\limits_{\left|\vartheta -\vartheta_{0}\right|\leq \nu } {\mathbf{E}}_{\vartheta} \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta \right)\right|^{2}\\ &&\qquad \quad =\sup\limits_{\left|\varphi_{\varepsilon} v \right|\leq \nu} {\mathbf{E}}_{\vartheta_{v}} \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)\right|^{2}\\ &&\qquad \quad\geq \sup\limits_{v\in {\mathcal{K}}_{M}} {\mathbf{E}}_{\vartheta_{v}} \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)\right|^{2}\\ &&\qquad \quad\geq \frac{1}{\left(2M\right)^{d}}{\int\nolimits}_{{\mathcal{K}}_{M}}^{}{\mathbf{E}}_{\vartheta_{v}} \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)\right|^{2}\mathrm{d}v\\ &&\qquad \quad =\frac{1}{\left(2M\right)^{d}}{\int\nolimits}_{{\mathcal{K}}_{M}}^{}{\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)\right|^{2}\mathrm{d}v. \end{array}$$
We have
$$\begin{array}{@{}rcl@{}} u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)&=&u\left(t,X_{t},\vartheta_{0} \right)+ \dot u\left(t,X_{t},\vartheta_{0} \right)^{*}\varphi_{\varepsilon} v\\ &&+\varepsilon {{\int\nolimits}_{0}^{1}}\left[\dot u\left(t,X_{t},\vartheta_{0}+r\varphi_{\varepsilon} v \right)-\dot u\left(t,X_{t},\vartheta_{0} \right)\right]^{*}{\mathbb{I}}_{t}^{-\frac{1}{2}} v\,\mathrm{d}r\\ & =& u\left(t,X_{t},\vartheta_{0} \right)+ \dot u\left(t,X_{t},\vartheta_{0} \right)^{*}\varphi_{\varepsilon} v+\varphi_{\varepsilon} h_{\varepsilon}, \end{array}$$
where |h ε |≤C ε. Hence, if we denote
$${\bar{b}}_{\varepsilon} =\varepsilon^{-1}\left({\bar{Y}}_{t}- u\left(t,X_{t},\vartheta_{0} \right) \right),\qquad \dot u=\dot u\left(t,X_{t},\vartheta_{0} \right)^{*} {\mathbb{I}}_{t}^{-\frac{1}{2}}$$
and introduce such vector $${\bar {v}}_{\varepsilon }$$ that $${\bar {b}}_{\varepsilon } = \dot u\left (t,X_{t},\vartheta _{0} \right)^{*}{\mathbb {I}}_{t}^{-\frac {1}{2}}{\bar {v}}_{\varepsilon }$$, then we can write
$$\begin{array}{@{}rcl@{}} &&\varepsilon^{-2}{\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varphi_{\varepsilon} v \right)\right|^{2}\\ &&\qquad\qquad \qquad\;={\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| {\bar{b}}_{t}-\dot u\left(t,X_{t},\vartheta_{0} \right)^{*}{\mathbb{I}}_{t}^{-\frac{1}{2}} v\right|^{2}\left(1+O(\varepsilon)\right)\\ &&\qquad\qquad \qquad\; ={\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| \dot u^{*}\left({\bar{v}}_{\varepsilon}- v\right)\right|^{2}\left(1+O(\varepsilon)\right). \end{array}$$

Further, we use the following result known as Scheffé’s lemma

### Lemma 1

Let the random variables Z ε ≥0,ε(0,1] converge in probability to the random variable Z≥0 as ε→0 and E Z ε =E Z=1, then
$$\begin{array}{@{}rcl@{}} \lim\limits_{\varepsilon \rightarrow 0} {\mathbf{E}} \left|Z_{\varepsilon}- Z\right|=0. \end{array}$$

For the proof see, e.g., Theorem A.4 in Ibragimov and Has’minskii (1981).

Recall that $${\mathbf {E}}_{\vartheta _{0}} Z_{t,\varepsilon }(v)={\mathbf {E}}_{\vartheta _{0}} Z_{t }(v)=1$$, where $$\ln Z_{t }(v)=v^{*}\Delta _{t}-\frac {1}{2}\left |v\right |^{2}$$. Hence for any K>0
$$\begin{array}{@{}rcl@{}} {\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| \dot u^{*}\left({\bar{v}}_{\varepsilon}- v\right)\right|_{K}^{2}={\mathbf{E}}_{\vartheta_{0}}Z_{t }(v) \left| \dot u^{*}\left({\bar{v}}_{\varepsilon}- v\right)\right|_{K}^{2}\left(1+o(1)\right). \end{array}$$
Here we denoted $$\left |D\right |^{2}_{K}=\left |D\right |^{2}\wedge K$$. These allow us to write
$$\begin{array}{@{}rcl@{}} &&\frac{1}{\left(2M\right)^{d}}{\int\nolimits}_{{\mathcal{K}}_{M}}^{}{\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varepsilon v \right)\right|^{2}\mathrm{d}v\\ &&\qquad\geq \frac{1}{\left(2M\right)^{d}}{\int\nolimits}_{{\mathcal{K}}_{M}}^{}{\mathbf{E}}_{\vartheta_{0}}Z_{t,\varepsilon }(v) \left| {\bar{Y}}_{t}-u\left(t,X_{t},\vartheta_{0}+\varepsilon v \right)\right|_{K}^{2}\mathrm{d}v\\ &&\qquad=\frac{1}{\left(2M\right)^{d}}{\int\nolimits}_{{\mathcal{K}}_{M}}^{}{\mathbf{E}}_{\vartheta_{0}}Z_{t }(v)\left| \dot u^{*}\left({\bar{v}}_{\varepsilon}- v\right)\right|_{K}^{2}\mathrm{d}v\left(1+o(1)\right). \end{array}$$
Then
$$\begin{array}{@{}rcl@{}} Z_{t }(v)=\exp\left\{v^{*}\Delta_{t}-\frac{1}{2}\left|v\right|^{2}\right\}=\exp\left\{-\frac{1}{2} \left|v-\Delta_{t} \right|^{2}\right\} \exp\left\{ \frac{1}{2}\left|\Delta_{t}\right|^{2}\right\} \end{array}$$
and
$$\begin{array}{@{}rcl@{}} &&{\mathbf{E}}_{\vartheta_{0}}Z_{t }(v)\left| \dot u^{*}\left({\bar{v}}_{\varepsilon}- v\right)\right|_{K}^{2}={\mathbf{E}}_{\vartheta_{0}}e^{-\frac{1}{2}\left|v-\Delta_{t} \right|^{2}}\left| \dot u^{*} \left(v_{\varepsilon}-v\right)\right|_{K}^{2}e^{\frac{1}{2} \left|\Delta_{t}\right|^{2}}\\ &&\qquad ={\mathbf{E}}_{\vartheta_{0}}e^{-\frac{1}{2}\left| w \right|^{2}}\left| \dot u^{*} \left({\bar{v}}_{\varepsilon}-\Delta_{t}- w\right)\right|_{K}^{2}e^{\frac{1}{2} \left|{\bar{\Delta}}_{t}\right|^{2}}\\ &&\qquad ={\mathbf{E}}_{\vartheta_{0}}e^{-\frac{1}{2}\left| w \right|^{2}}\left| \dot u^{*} \left(\tilde w_{\varepsilon}- w\right)\right|_{K}^{2}e^{\frac{1}{2} \left|{\bar{\Delta}}_{t}\right|^{2}} \end{array}$$

where w=vΔ t and $$\tilde w_{\varepsilon } = {\bar {v}}_{\varepsilon } -\Delta _{t}$$. Introduce the set $${\mathcal {C}}_{M}$$ such that each coordinate of $$\Delta _{t}=\left (\Delta _{t}^{(1)},\ldots,\Delta _{t}^{(d)} \right)$$ is less than $$M-\sqrt {M}$$, i.e., $$\left |\Delta _{t}^{(l)} \right |\leq M-\sqrt {M}$$. Then

because $${\mathcal {K}}_{\sqrt {M}}\subset {\mathcal {C}}_{{M}}$$. By Andersen’s Lemma (see, e.g., Ibragimov and Has’minskii (1981), Lemma 2.10.2)
$$\begin{array}{@{}rcl@{}} {\int\nolimits}_{{\mathcal{K}}_{\sqrt{M}}}^{} \left| \dot u^{*} \left(\tilde w_{\varepsilon}- w\right)\right|_{K}^{2}e^{-\frac{1}{2}\left| w \right|^{2}}\mathrm{d}v\geq {\int\nolimits}_{{\mathcal{K}}_{\sqrt{M}}}^{} \left| \dot u^{*} w\right|_{K}^{2}e^{-\frac{1}{2}\left| w \right|^{2}}\mathrm{d}v. \end{array}$$

Note that as M we obtain the limits

and
$$\begin{array}{@{}rcl@{}} \frac{1}{\left(2\pi \right)^{\frac{d}{2}}}{\int\nolimits}_{{\mathcal{K}}_{\sqrt{M}}}^{} \left| \dot u^{*} w\right|_{K}^{2}e^{-\frac{1}{2}\left| w \right|^{2}}\mathrm{d}v&\longrightarrow {\mathbf{E}}_{\vartheta_{0}} \left|\dot u^{*}\Delta_{t}\right|_{K}^{2}. \end{array}$$
The last steps are ε→0 and K
$$\begin{array}{@{}rcl@{}} {\mathbf{E}}_{\vartheta_{0}} \left|\dot u^{*}\Delta_{t}\right|_{K}^{2}\longrightarrow \dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right)^{*}\mathbb{I}_{t}\left(\vartheta_{0}\right)^{-1}\dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right). \end{array}$$

The detailed proof can be found in Ibragimov and Has’minskii (1981), Theorem 2.12.1.

Therefore the bound (15) is verified. The bound (16) is proved in a similar way. Note that Z t =ε u x′(t,X t ,𝜗)σ(t,X t ). An arbitrary estimator $${\bar {Z}}_{t}$$ of Z t we write as $${\bar {Z}}_{t}=\varepsilon \tilde Z_{t}$$. Then, for $$\varepsilon ^{-1}\left ({\bar {Z}}_{t}-Z_{t}\right)$$ we follow the proof given above.

### Definition

Suppose that the conditions $${\mathfrak L}, {\mathfrak R}, {\mathfrak U}$$ are fulfilled. Then we call the estimator-processes $$Y_{t}^{*}, Z_{t}^{*}, 0<t\leq T$$ asymptotically efficient if for all 𝜗 0Θ and all t(0,T] we have the equalities
$$\begin{array}{@{}rcl@{}} && \lim\limits_{\nu \rightarrow 0}\lim\limits_{\varepsilon \rightarrow 0} \sup\limits_{\left|\vartheta -\vartheta_{0}\right|\leq \nu} \varepsilon^{-2}{\mathbf{E}}_{\vartheta} \left| Y_{t}^{*}-Y_{t}\right|^{2}= \dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right)^{*}\mathbb{I}_{t}\left(\vartheta_{0}\right)^{-1}\dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right), \end{array}$$
(17)
$$\begin{array}{@{}rcl@{}} &&\lim\limits_{\nu \rightarrow 0}\lim\limits_{\varepsilon \rightarrow 0} \sup\limits_{\left|\vartheta -\vartheta _{0}\right|\leq \nu} \varepsilon^{-4}{\mathbf{E}}_{\vartheta} \left| Z_{t}^{*}-Z_{t}\right|^{2} = \left|{\left(\dot u_{\circ}\right)'_{x}\left(t,x_{t},\vartheta _{0}\right)^{*}{\mathbb{I}_{t}}\left(\vartheta_{0}\right)^{-\frac{1}{2}} \sigma \left(t,x_{t}\right)}\right|^{2}. \end{array}$$
(18)
As we do not know the value 𝜗 we propose first to estimate it using some estimator-process 𝜗 ε,t,0<tT and then to put
$$Y_{t}^{\star}=u\left(t,X_{t},\vartheta_{\varepsilon} ^{\star} \right),\qquad Z_{t}^{\star}=\varepsilon \sum\limits_{l=1}^{k} u'_{x_{l}} \left(t,X_{t},\vartheta_{\varepsilon}^{\star} \right)\sigma_{l} \left(t,X_{t}\right).$$

Recall that formally the MLE-process $${\hat {\vartheta }}_{\varepsilon,t}, 0<t\leq T$$ “solves” the problem and it can be shown that under the supposed regularity conditions the estimator-processes $${\hat {Y}}_{t,\varepsilon }=u(t,X_{t},{\hat {\vartheta }}_{\varepsilon,t})$$ and $${\hat {Z}}_{t,\varepsilon }=u'_{x}(t,X_{t},{\hat {\vartheta }}_{\varepsilon,t})\sigma \left (t,X_{t}\right)$$ are asymptotically efficient in the sense of the relations (17) and (18), respectively, but this solution can not be called acceptable because the calculation of $${\hat {\vartheta }}_{\varepsilon,t}$$ for all t(0,T], in the general case, is a computationally difficult problem. That is why we propose to use the so-called multi-step MLE-process (Kutoyants 2015), which is introduced as follows. First we construct a preliminary estimator $${\bar {\vartheta }}_{\tau _{\varepsilon } }$$ by the observations $$X^{\tau _{\varepsilon } }=\left (X_{s},0\leq s\leq \tau _{\varepsilon } \right)$$ on some learning interval [0,τ ε ], where τ ε =ε δ with 0<δ<1 and then we propose an estimator-process $$\vartheta _{t,\varepsilon }^{\star }, \tau _{\varepsilon } \leq t\leq T$$ based on this preliminary estimator. Finally we show that the corresponding estimators, say, $$Y_{t,\varepsilon }^{\star }=u\left (t,X_{t},\vartheta _{t,\varepsilon }^{\star }\right), \tau _{\varepsilon } \leq t\leq T$$ are asymptotically efficient.

As a preliminary we propose the minimum distance estimator (MDE) $${\bar {\vartheta }}_{\tau _{\varepsilon } }$$ defined by the relation
$$\begin{array}{@{}rcl@{}} \left\|X-{\hat{X}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)\right\|_{\tau_{\varepsilon} }^{2}=\inf_{\vartheta \in \Theta }\left\|X-{\hat{X}}\left(\vartheta \right)\right\|_{\tau_{\varepsilon} }^{2}=\inf_{\vartheta \in \Theta }{\int\nolimits}_{0}^{\tau_{\varepsilon} }\left[X_{t}-{\hat{X}}_{t}(\vartheta)\right]^{2}\,\mathrm{d} t. \end{array}$$
Here the family of random processes $$\left \{\left ({\hat {X}}_{t}\left (\vartheta \right),0\leq t\leq \tau _{\varepsilon } \right),\vartheta \in \Theta \right \}$$ is defined as follows
$$\begin{array}{@{}rcl@{}} {\hat{X}}_{t}(\vartheta)=x_{0}+{{\int\nolimits}_{0}^{t}}S\left(\vartheta,s,X_{s}\right)\mathrm{d}s,\qquad 0\leq t\leq \tau_{\varepsilon},\qquad \vartheta \in \Theta. \end{array}$$
These estimators were studied in Kutoyants (1994) in the case of fixed τ ε =τ and are called the trajectory fitting estimators as well, because we choose an estimator $${\bar {\vartheta }}_{\tau _{\varepsilon } }$$, which provides a trajectory $${\hat {X}}_{t}\left ({\bar {\vartheta }}_{\tau _{\varepsilon } }\right),0\leq t\leq \tau _{\varepsilon }$$ closest to the observations X t ,0≤tτ ε . It was shown that if the conditions of regularity and the condition of identifiability: for any ν>0
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta_{0}\in \Theta }\inf\limits_{\left|\vartheta -\vartheta_{0}\right|>\nu }{\int\nolimits}_{0}^{\tau }\left| {{\int\nolimits}_{0}^{t}} \left[S\left(\vartheta,s,x_{s}\right)- S\left(\vartheta_{0},s,x_{s}\right)\right]\mathrm{d}s\right|^{2}\mathrm{d}t>0 \end{array}$$
(19)
hold and the matrix
$$\begin{array}{@{}rcl@{}} \mathbb{J}_{\tau} \left(\vartheta_{0}\right)={\int\nolimits}_{0}^{\tau} {{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s} \right)^{*}\mathrm{d}s {{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s} \right)\mathrm{d}s\,\mathrm{d}t \end{array}$$
is uniformly nondegenerate (below $$\lambda \in {\mathcal {R}}^{d}$$)
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta_{0}\in \Theta }\inf\limits_{\left|\lambda \right|=1}\lambda^{*}\mathbb{J}_{\tau} \left(\vartheta_{0}\right)\lambda >0, \end{array}$$
(20)
then the MDE is asymptotically normal
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left(\vartheta_{\tau} -\vartheta_{0}\right)\Longrightarrow \mathbb{J}_{\tau} \left(\vartheta_{0}\right)^{-1} {\int\nolimits}_{0}^{\tau} {{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s} \right)^{*}\mathrm{d}s\;{{\int\nolimits}_{0}^{t}}\sigma \left(s,x_{s} \right)^{*}\mathrm{d}W_{s}\;\mathrm{d}t. \end{array}$$
Note that if we have the Regularity condition 3 (identifiability) with T=τ, then the identifiability condition (19) is also fulfilled. Indeed, suppose that there exists 𝜗 1𝜗 0 such that
$$\begin{array}{@{}rcl@{}} {\int\nolimits}_{0}^{\tau }\left| {{\int\nolimits}_{0}^{t}} \left[S\left(\vartheta_{1},s,x_{s}\right)- S\left(\vartheta_{0},s,x_{s}\right)\right]\mathrm{d}s\right|^{2}\mathrm{d}t=0. \end{array}$$
Then for all t[0,τ]
$$\begin{array}{@{}rcl@{}} {{\int\nolimits}_{0}^{t}} S\left(\vartheta_{1},s,x_{s}\right)\mathrm{d}s= {{\int\nolimits}_{0}^{t}}S\left(\vartheta_{0},s,x_{s}\right)\mathrm{d}s, \end{array}$$
which implies
$$\begin{array}{@{}rcl@{}} S\left(\vartheta_{1},s,x_{s}\right)= S\left(\vartheta_{0},s,x_{s}\right),\qquad 0\leq s\leq \tau. \end{array}$$

The last equality, of course, contradicts Regularity condition 3.

Now suppose that τ ε =ε δ with δ<1 and the matrix
$$\begin{array}{@{}rcl@{}} \mathbb{C}\left(\vartheta_{0}\right)=\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)^{*}\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right) \end{array}$$
is uniformly nondegenerate in 𝜗 0Θ (below $$\lambda \in {\mathcal {R}}^{d}$$)
$$\begin{array}{@{}rcl@{}} \inf\limits_{\vartheta \in\Theta }\inf\limits_{\left|\lambda \right|=1}\lambda^{*}\mathbb{C}\left(\vartheta_{0}\right)\lambda >0 \end{array}$$
(21)
Then, we can obtain the asymptotics
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left(\vartheta_{\tau} -\vartheta_{0}\right)=\frac{3}{2 \sqrt{\tau_{\varepsilon} }} \mathbb{C}\left(\vartheta_{0}\right)^{-1}\sigma \left(0,x_{0}\right){{\int\nolimits}_{0}^{1}}\left[1-r^{2}\right]\mathrm{d}W(r)\left(1+o(1)\right). \end{array}$$
Note that
$$\begin{array}{@{}rcl@{}} \mathbb{J}_{\tau_{\varepsilon} }\left(\vartheta_{0}\right)&=&{\int\nolimits}_{0}^{\tau_{\varepsilon} }t^{2} \dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)^{*}\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)\mathrm{d}t \left(1+o(1)\right)\\ &=&\frac{\tau_{\varepsilon} ^{3}}{3} \dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)^{*}\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right) \left(1+o(1)\right) \end{array}$$
and
$$\begin{array}{@{}rcl@{}} &&{\int\nolimits}_{0}^{\tau_{\varepsilon}} {{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s} \right)^{*}\mathrm{d}s\;{{\int\nolimits}_{0}^{t}}\sigma \left(s,x_{s} \right)^{*}\mathrm{d}W_{s}\;\mathrm{d}t\\ &&\qquad \quad =\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)^{*}\sigma \left(0,x_{0}\right){\int\nolimits}_{0}^{\tau_{\varepsilon}} t W_{t}\;\mathrm{d}t\left(1+o(1)\right)\\ &&\qquad \quad =\frac{\tau_{\varepsilon} ^{\frac{5}{2}}}{2}\dot{\mathbb{S}}\left(\vartheta_{0},0,x_{0} \right)^{*}\sigma \left(0,x_{0}\right){{\int\nolimits}_{0}^{1}}\left(1-r^{2}\right)\; \mathrm{d}W(r)\left(1+o(1)\right). \end{array}$$
Therefore, the family of random vectors $$\varepsilon ^{-1+\frac {\delta }{2}}\left ({\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}\right)$$ is asymptotically normal. Moreover, following Kutoyants (1994) it can be shown that the moments are bounded, i.e.,
$$\begin{array}{@{}rcl@{}} \sup\limits_{\vartheta_{0}\in \textbf{K}}{\mathbf{E}}_{\vartheta_{0}}\left| \varepsilon^{-1+\frac{\delta }{2}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0}\right) \right|^{p}<C, \end{array}$$
(22)

where the constant C=C(p)>0 does not depend on ε for all p>0.

Let us introduce the one-step MLE-process $$\vartheta _{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T$$
$$\begin{array}{@{}rcl@{}} &&\vartheta_{t,\varepsilon }^{\star}={\bar{\vartheta}}_{\tau_{\varepsilon} }\\ &&\quad +\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)\mathrm{d}s\right]. \end{array}$$
(23)

Its properties are described in the following proposition.

### Proposition 1

Let the conditions $${\mathfrak L}, {\mathfrak R}$$ be fulfilled and δ(0,1), then for all t(0,T]
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0} \right)\Rightarrow {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta_{0 }\right)^{-1} \right) \end{array}$$
and this estimator-process is asymptotically efficient. Moreover, we have the uniform consistency, i.e., for any ν>0
$$\begin{array}{@{}rcl@{}} \lim\limits_{\varepsilon \rightarrow 0}\sup\limits_{\vartheta_{0}\in \textbf{K}}\mathbf{P}_{\vartheta _{0}}^{(\varepsilon)}\left(\sup\limits_{\tau_{\varepsilon} \leq t\leq T}\left|\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0} \right|>\nu \right)=0. \end{array}$$

### Proof

Note that the estimator $$\vartheta _{t,\varepsilon }^{\star }$$ is defined for t[τ ε ,T], but as τ ε →0 we obtain for any positive t the relation t>τ ε . □

The substitution of the observations (9) provides us the equality
$$\begin{array}{@{}rcl@{}} &&\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0}={\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0}+\varepsilon \mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\sigma \left(s,X_{s}\right)^{-1}\mathrm{d}W_{s}\\ &&\; +\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[S\left(\vartheta_{0 },s,X_{s}\right) -S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right) \right] \mathrm{d}s. \end{array}$$
Recall that the vector-process (X s ,0≤sT) converges uniformly in s to the deterministic vector-function (x s ,0≤sT) and the estimator $${\bar {\vartheta }}_{\tau _{\varepsilon }}$$ is consistent. Therefore, we have the convergence in probability
$$\begin{array}{@{}rcl@{}} &&\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\sigma \left(s,X_{s}\right)^{-1}\mathrm{d}W_{s}\\ &&\qquad \quad \longrightarrow \mathbb{I}_{t}\left(\vartheta _{0}\right)^{-1}{{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right)^{*}\sigma \left(s,x_{s}\right)^{-1}\mathrm{d}W_{s}\sim {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta _{0 }\right)^{-1} \right). \end{array}$$
For the other terms, we first write the Taylor expansion
$$\begin{array}{@{}rcl@{}} S\left(\vartheta_{0 },s,X_{s}\right) -S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)=\dot{\mathbb{S}}\left(\vartheta_{0},s,X_{s}\right)^{*}\left(\vartheta_{0}- {\bar{\vartheta}}_{\tau_{\varepsilon} }\right)+ O\left(\varepsilon^{2-\delta }\right) \end{array}$$
because $$\vartheta _{0}- {\bar {\vartheta }}_{\tau _{\varepsilon } }=O\left (\varepsilon ^{1-\frac {\delta }{2}}\right)$$. Then, we denote
$$\begin{array}{@{}rcl@{}} \mathbb{D}_{\varepsilon} =\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)-{\int\nolimits}_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A} \left(s,X_{s}\right)^{-1}\dot{\mathbb{S}}\left(\vartheta_{0},s,X_{s}\right)^{*}\mathrm{d}s \end{array}$$
and write
$$\begin{array}{@{}rcl@{}} && {\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0} \,+\,\mathbb{I}_{t}\!\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}\!\!{\int\nolimits}_{\tau _{\varepsilon} }^{t}\!\dot{\mathbb{S}}\!\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\!\mathbb{A}\left(s,X_{s}\right)^{-1}\left[S\left(\vartheta_{0 },s,X_{s}\right) -S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right) \right] \mathrm{d}s\\ &&\quad =\left({\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0}\right)\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}\mathbb{D}_{\varepsilon} +O\left(\varepsilon^{2-\delta }\right) \end{array}$$
The following estimate can be easily verified
$$\begin{array}{@{}rcl@{}} \mathbb{D}_{\varepsilon}=O\left(\varepsilon^{\delta} \right)+O\left(\varepsilon^{1-\frac{\delta }{2}}\right)+O(\varepsilon) \end{array}$$
because X s x s =O(ε), $${\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}= O\left (\varepsilon ^{1-\frac {\delta }{2}}\right)$$ and
$$\begin{array}{@{}rcl@{}} \mathbb{I}_{t}\left(\vartheta_{0}\right){-}\int_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right)^{*}\mathbb{A}\left(s,x_{s}\right)^{-1}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right) \mathrm{d}s=O\left({\varepsilon^{\delta} }\right). \end{array}$$
Hence
$$\begin{array}{@{}rcl@{}} &&\varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0}\right)-\mathbb{I}_{t}\left(\vartheta_{0}\right)^{-1}{{\int\nolimits}_{0}^{t}}\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right)^{*}\sigma \left(s,x_{s}\right)^{-1}\mathrm{d}W_{s}\\ &&\qquad \qquad =\varepsilon^{-1}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0} \right)O\left(\varepsilon^{1-\frac{\delta }{2}}\right)=O\left(\varepsilon^{1-\delta }\right) \longrightarrow 0. \end{array}$$
(24)

The uniform consistency can be shown following the proof of such uniform consistency presented in Kutoyants (2015), Theorem 1.

Let us define the estimator-processes $$Y^{\star }_{\varepsilon } =\left (Y_{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T \right)$$ and $$Z^{\star }_{\varepsilon } =\left (Z_{t,\varepsilon }^{\star },\tau _{\varepsilon } \leq t\leq T \right)$$ as follows
$$\begin{array}{@{}rcl@{}} Y_{t,\varepsilon }^{\star}=u\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star}\right),\qquad \quad Z_{t,\varepsilon }^{\star}=\varepsilon u'_{x}\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star}\right)\sigma \left(t,X_{t}\right). \end{array}$$

### Theorem 2

Suppose the conditions $${\mathfrak L},{\mathfrak R},{\mathfrak U}$$ and (21) hold, then the esti-mator-processes $$Y_{\varepsilon }^{\star },Z_{\varepsilon }^{\star }$$ admit the representations
$$\begin{array}{@{}rcl@{}} Y_{t,\varepsilon }^{\star}&=&Y_{t}+\varepsilon \dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right)^{*}\,\xi_{t}\left(\vartheta_{0}\right)\left(1+o(1) \right), \end{array}$$
(25)
$$\begin{array}{@{}rcl@{}} Z_{t,\varepsilon }^{\star}&=&Z_{t}+\varepsilon^{2}\dot u'_{\circ,x}\left(t,x_{t},\vartheta_{0}\right)^{*}\,\xi_{t}\left(\vartheta_{0}\right)\sigma \left(t,x_{t}\right)\left(1+o(1)\right), \end{array}$$
(26)
where the Gaussian process
$$\xi_{t}\left(\vartheta_{0}\right)={\mathbb{I}_{t}} \left(\vartheta_{0}\right)^{-1} {{\int\nolimits}_{0}^{t}}{\dot{\mathbb{S}}\left(\vartheta_{0},s,x_{s}\right)^{*}}{\sigma \left(s,x_{s}\right)^{-1}}\mathrm{d}W_{s},\qquad \tau _{\varepsilon} \leq t\leq T.$$
The random processes
$$\begin{array}{@{}rcl@{}} \eta_{t,\varepsilon }&=&\varepsilon^{-1} \left(Y_{t,\varepsilon }^{\star}-Y_{t}\right),\tau \leq t\leq T,\\ \zeta_{t,\varepsilon }&=&\varepsilon^{-2} \left(Z_{t,\varepsilon }^{\star}-Z_{t}\right),\tau \leq t\leq T \end{array}$$
for any τ(0,T] converge in probability to the processes
$$\begin{array}{@{}rcl@{}} \eta_{t}&=&\dot u_{\circ}\left(t,x_{t},\vartheta_{0}\right)^{*}\,\xi_{t}\left(\vartheta_{0}\right),\qquad \tau \leq t\leq T,\\ \zeta_{t}&=&\dot u'_{\circ,x}\left(t,x_{t},\vartheta_{0}\right)^{*}\,\xi_{t}\left(\vartheta_{0}\right)\sigma \left(t,x_{t}\right),\qquad \tau \leq t\leq T, \end{array}$$

respectively, uniformly in t[τ,T]. Moreover, these approximations are asymptotically efficient in the sense of (17), (18).

### Proof

By the condition $${\mathfrak U}$$, we obtain the representation
$$\begin{array}{@{}rcl@{}} Y_{t,\varepsilon }^{\star}-Y_{t}&=&u\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star} \right)-u\left(t,X_{t},\vartheta_{0} \right)=\dot u(t,X_{t}, \vartheta_{0})^{*}\left(\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0} \right)\left(1+o(1)\right),\\ Z_{t,\varepsilon}^{\star}-Z_{t}&=&\varepsilon \left[u'_{x}\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star} \right)-u'_{x}\left(t,X_{t},\vartheta_{0} \right)\right]\sigma \left(t,X_{t}\right)\\ &=&\varepsilon\dot u'_{x}\left(t,X_{t}, \vartheta_{0}\right)^{*}\left(\vartheta_{t,\varepsilon }^{\star}-\vartheta_{0} \right)\sigma \left(t,X_{t}\right)\left(1+o(1)\right), \end{array}$$
and for any τ(0,T] we have the convergence in probability
$$\begin{array}{@{}rcl@{}} &&\sup\limits_{\tau \leq t\leq T}\left|\dot u(t,X_{t},\vartheta_{0})-\dot u_{\circ}(t,x_{t}, \vartheta_{0}) \right|\leq \sup\limits_{\tau \leq t\leq T}\left|\dot u(t,X_{t},\vartheta_{0})-\dot u(t,x_{t}, \vartheta_{0}) \right|\\ &&\qquad \qquad +\sup\limits_{\tau \leq t\leq T}\left|\dot u(t,x_{t},\vartheta_{0})-\dot u_{\circ}(t,x_{t}, \vartheta_{0}) \right|\longrightarrow 0, \end{array}$$
(27)
$$\begin{array}{@{}rcl@{}} &&\sup\limits_{\tau \leq t\leq T}\left|\dot u'_{x}(t,X_{t},\vartheta_{0})-\dot u'_{x}(t,x_{t}, \vartheta_{0}) \right|\leq \sup\limits_{\tau \leq t\leq T}\left|\dot u'_{x}(t,X_{t},\vartheta_{0})-\dot u'_{x}(t,x_{t}, \vartheta_{0}) \right|\\ &&\qquad \qquad +\sup\limits_{\tau \leq t\leq T}\left|\dot u'_{x}(t,x_{t},\vartheta_{0})-\dot u'_{\circ,x}(t,x_{t}, \vartheta_{0}) \right|\longrightarrow 0. \end{array}$$
(28)

Therefore, the representations (25),(26) follow now from (24).

More detailed analysis shows that the convergences O(1) in (24),(25) are uniform in t[τ,T] due to (11). Moreover, we have the convergence of moments uniform on compacts 𝜗 0K as well, because we have (12) and the moments of the preliminary estimator are bounded (22). Therefore, the estimates used above can be also written for the moments. This convergence of moments provides the asymptotic efficiency of the estimators $$Y^{\star }_{\varepsilon },Z^{\star }_{\varepsilon }$$.

The estimators $$Y^{\star }_{t,\varepsilon },Z^{\star }_{t,\varepsilon },\tau _{\varepsilon } \leq t\leq T$$ are given for the values t>τ ε =ε δ with δ(0,1). It is interesting to have a shorter learning interval and, therefore, longer estimation period for Y t ,Z t . That is why we propose the two-step MLE-process which uses the preliminary estimator with the worse rate of convergence. Let us take $$\delta \in [1,\frac {4}{3})$$, introduce the second preliminary estimator-process
$$\begin{array}{@{}rcl@{}} {\bar{\vartheta}}_{t,\varepsilon }={\bar{\vartheta}}_{\tau_{\varepsilon} }+\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}\int_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)\mathrm{d}s\right] \end{array}$$
and the two-step MLE-process $$\vartheta _{t,\varepsilon }^{\star \star },\tau _{\varepsilon } \leq t\leq T$$
$$\begin{array}{@{}rcl@{}} \vartheta_{t,\varepsilon }^{\star\star}&={\bar{\vartheta}}_{t,\varepsilon } +\mathbb{I}_{t}\left({\bar{\vartheta}}_{t,\varepsilon}\right)^{-1}\int_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{t,\varepsilon },s,X_{s}\right)\mathrm{d}s\right]. \end{array}$$
For the preliminary estimator we obtain the same estimate (22), but with different τ ε . Further, for the first preliminary estimator similar calculations as above provide us the estimates
$$\begin{array}{@{}rcl@{}} \varepsilon^{-\gamma }\left({\bar{\vartheta}}_{t,\varepsilon }-\vartheta_{0}\right)=\varepsilon^{-\gamma }\left|{\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0} \right|^{2}O(1)+o(1)=\varepsilon^{-\gamma+2-\delta }O(1) +o(1). \end{array}$$
For the two-step MLE-process we have
$$\begin{array}{@{}rcl@{}} &&\varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star\star}-\vartheta_{0}\right)=\varepsilon^{\gamma -\frac{\delta }{2}} \left(\varepsilon^{-\gamma }\left|{\bar{\vartheta}}_{t,\varepsilon }-\vartheta_{0}\right|\right)\;(\varepsilon^{-1+\frac{\delta }{2} }\left|{\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0} \right|)O(1)\\ &&\qquad \qquad \qquad\quad +\mathbb{I}_{t}\left(\vartheta_{0 }\right)^{-1}{\int\nolimits}_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left(\vartheta_{0 },s,x_{s}\right)^{*}\mathbb{A}\left(s,x_{s}\right)^{-1}\mathrm{d}W_{s}+o(1). \end{array}$$
Therefore if we take γ such that γ+δ<2 and $$\gamma -\frac {\delta }{2}>0$$, say, $$\gamma <\frac {2}{3}$$, then we obtain
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star\star}-\vartheta_{0}\right)\Longrightarrow {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta_{0 }\right)^{-1} \right). \end{array}$$
Now the estimator-processes $$Y^{\star \star }_{\varepsilon },Z^{\star \star }_{\varepsilon }$$ defined with the help of two-step MLE-process
$$\begin{array}{@{}rcl@{}} Y_{t,\varepsilon }^{\star\star}&=&u\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star\star}\right),\qquad \tau_{\varepsilon} \leq t\leq T,\\ Z_{t,\varepsilon }^{\star\star}&=&u'_{x}\left(t,X_{t},\vartheta_{t,\varepsilon }^{\star\star}\right)\sigma \left(t,X_{t}\right),\qquad \tau_{\varepsilon} \leq t\leq T \end{array}$$

are known for the larger time interval [τ ε ,T].

Of course, we can continue this process and to reduce the learning interval once more by introducing the three-step MLE-process $$\vartheta _{t,\varepsilon }^{\star \star \star }$$ as follows. The learning interval is [0,τ ε ], τ ε =ε δ , where $$\delta \in [\frac {4}{3}, \frac {3}{2})$$. The first preliminary estimator-process is
$$\begin{array}{@{}rcl@{}} {\bar{\vartheta}}_{t,\varepsilon }&={\bar{\vartheta}}_{\tau_{\varepsilon} }+\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)\mathrm{d}s\right], \end{array}$$
the second is
$$\begin{array}{@{}rcl@{}} \bar{{\bar{\vartheta}}}_{t,\varepsilon }&={\bar{\vartheta}}_{t,\varepsilon }+\mathbb{I}_{t}\left({\bar{\vartheta}}_{t,\varepsilon }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S\left({\bar{\vartheta}}_{t,\varepsilon },s,X_{s}\right)\mathrm{d}s\right], \end{array}$$
and the three-step MLE-process
$$\begin{array}{@{}rcl@{}} {\vartheta}_{t,\varepsilon }^{\star\star\star}&=\bar{{\bar{\vartheta}}}_{t,\varepsilon }+\mathbb{I}_{t}(\bar{{\bar{\vartheta}}}_{t,\varepsilon })^{-1}\int_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}\left({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s}\right)^{*}\mathbb{A}\left(s,X_{s}\right)^{-1}\left[\mathrm{d}X_{s}-S(\bar{{\bar{\vartheta}}}_{t,\varepsilon },s,X_{s})\mathrm{d}s\right]. \end{array}$$
The similar calculations will provide us the relations : $${\bar {\vartheta }}_{\tau _{\varepsilon } }-\vartheta _{0}=\varepsilon ^{1-\frac {\delta }{2}}O(1)$$,
$$\begin{array}{@{}rcl@{}} \varepsilon^{-\gamma_{1}}\left({\bar{\vartheta}}_{t,\varepsilon }-\vartheta_{0}\right)=\varepsilon^{2-\gamma_{1}-{\delta }} O(1),\qquad \quad \varepsilon^{-\gamma_{2}}\left(\bar{{\bar{\vartheta}}}_{t,\varepsilon }-\vartheta_{0}\right)=\varepsilon^{-\gamma_{2}+\gamma_{1}+1-\frac{\delta }{2}} O(1), \end{array}$$
and
$$\begin{array}{@{}rcl@{}} &&\varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star\star\star}-\vartheta_{0}\right)=\varepsilon^{\gamma_{2} -\frac{\delta }{2}} \left(\varepsilon^{-\gamma_{2} }\left|\bar{{\bar{\vartheta}}}_{t,\varepsilon }-\vartheta_{0}\right|\right)\;(\varepsilon^{-1+\frac{\delta }{2} }\left|{\bar{\vartheta}}_{\tau_{\varepsilon} }-\vartheta_{0} \right|)O(1)\\ &&\qquad \qquad \qquad +\mathbb{I}_{t}\left(\vartheta_{0 }\right)^{-1}{\int\nolimits}_{\tau_{\varepsilon} }^{t}\dot{\mathbb{S}}\left(\vartheta_{0 },s,x_{s}\right)^{*}\mathbb{A}\left(s,x_{s}\right)^{-1}\mathrm{d}W_{s}+o(1). \end{array}$$
Hence if we chose δ, γ 1 and γ 2 such that
$$\begin{array}{@{}rcl@{}} \delta <2,\qquad \gamma_{1}+\delta <2,\qquad \gamma_{1}-\gamma_{2}+1-\frac{\delta }{2}>0,\quad \gamma_{2}>\frac{\delta }{2}, \end{array}$$
then once more we obtain asymptotically efficient MLE-process
$$\begin{array}{@{}rcl@{}} \varepsilon^{-1}\left(\vartheta_{t,\varepsilon }^{\star\star\star}-\vartheta_{0}\right)\Longrightarrow {\mathcal{N}}\left(0,\mathbb{I}_{t}\left(\vartheta_{0 }\right)^{-1} \right). \end{array}$$

Therefore, we obtain the corresponding approximations $$Y_{t,\varepsilon }^{\star \star \star },Z_{t,\varepsilon }^{\star \star \star }$$ for the values t[τ ε ,T] with essentially smaller τ ε than in the case of one-step MLE-process.

## Example

Black and Scholes model. Suppose that the forward equation is
$$\mathrm{d} X_{t}=\vartheta X_{t} \mathrm{d}t+\varepsilon\sigma X_{t} \mathrm{d}W_{t},\quad X_{0}=x_{0}>0, \quad 0\leq t\leq T,$$
and the functions f(x,y,z)=−β yγ x z and Φ(x) are given. The function Φ(x) is continuous and satisfies the condition |Φ(x)|≤C(1+|x| p ) with some constants C>0 and p>0. We have to find (Y t ,Z t ) such that
$$\begin{array}{@{}rcl@{}} \mathrm{d}Y_{t}=\left[\beta Y_{t}+\gamma X_{t}Z_{t}\right]\mathrm{d}t+Z_{t}\mathrm{d}W_{t},\qquad 0\leq t\leq T, \end{array}$$

and Y T =Φ(X T ).

The corresponding PDE is
$$\begin{array}{@{}rcl@{}} \frac{\partial u}{\partial t}+\frac{\varepsilon^{2}\sigma^{2}x^{2}}{2}\frac{\partial^{2} u}{\partial x^{2}} +(\vartheta-\varepsilon\sigma\gamma)x\frac{\partial u}{\partial x}-\beta u=0,\qquad u(T,x,\vartheta)=\Phi(x). \end{array}$$
To write its solution we change the variables $$s=T-t, \bar x=\ln x$$ and let $$u\left (t,\bar x,\vartheta \right)= e^{\mu (\vartheta) \bar x+\lambda (\vartheta) s}v\left (s,\bar x,\vartheta \right)$$, where
$$\begin{array}{@{}rcl@{}} \mu(\vartheta) =\frac{2\varepsilon \sigma \gamma +\varepsilon \sigma^{2}-2\vartheta }{2\varepsilon^{2}\sigma^{2}},\qquad \lambda(\vartheta) =\beta +\frac{\left(2\varepsilon \sigma \gamma +\varepsilon \sigma^{2}-2\vartheta\right)^{2} }{8\varepsilon^{2}\sigma^{2} }. \end{array}$$
Then, we obtain the reduced equation
$$\begin{array}{@{}rcl@{}} \frac{\partial v}{\partial s}=\frac{\varepsilon^{2}\sigma^{2}}{2}\frac{\partial^{2} v}{\partial \bar x^{2}},\qquad \ 0\leq s\leq T, \qquad \quad v(0,\bar x,\vartheta)=e^{-\mu(\vartheta) \bar x}\Phi(e^{\bar x}), \end{array}$$
whose solution is well-known
$$\begin{array}{@{}rcl@{}} v(s,\bar x,\vartheta)=\frac{1}{\sqrt{2\pi s\varepsilon^{2}\sigma ^{2}}}{\int\nolimits}_{-\infty }^{\infty} \exp\left\{-\frac{\left(\bar x-z\right)^{2}}{2\varepsilon ^{2}\sigma^{2}s}\right\}e^{-\mu(\vartheta) z}\Phi \left(e^{z}\right)\mathrm{d}z. \end{array}$$
Let us fix $$\tau _{\varepsilon } =\varepsilon ^{\frac {3}{4}}$$ and introduce the preliminary TFE is
$$\begin{array}{@{}rcl@{}} {\bar{\vartheta}}_{\tau_{\varepsilon} }=\frac{\int_{0}^{\tau_{\varepsilon} }\left(X_{s}-x_{0}\right)H_{t}\;\mathrm{d}t }{\int_{0}^{\tau_{\varepsilon} }{H_{t}^{2}}\;\mathrm{d}t},\qquad H_{t}={{\int\nolimits}_{0}^{t}}X_{s}\;\mathrm{d}s. \end{array}$$
Of course, we also can write the MLE $${\hat {\vartheta }}_{\tau _{\varepsilon } }$$
$$\begin{array}{@{}rcl@{}} {\hat{\vartheta}}_{\tau_{\varepsilon} }=\frac{1}{\tau_{\varepsilon}}{\int\nolimits}_{0 }^{\tau _{\varepsilon}} \frac{\mathrm{d}X_{s}}{X_{s}}, \end{array}$$
but as in our work we used the TFE, we show how to calculate $${\bar {\vartheta }}_{\tau _{\varepsilon } }$$. The Fisher information is $$\mathbb {I}_{t}\left (\vartheta \right)=t\sigma ^{-2}$$. The one-step MLE-process is
$$\begin{array}{@{}rcl@{}} \vartheta_{t,\varepsilon }^{\star}={\bar{\vartheta}}_{\tau_{\varepsilon} }+\frac{1}{t}{\int\nolimits}_{\tau_{\varepsilon} }^{t} \frac{1}{X_{s}}\left[\mathrm{d}X_{s}-\bar \vartheta_{\tau_{\varepsilon} }X_{s}\mathrm{d}s\right]. \end{array}$$
Moreover, it is easy to see that
$$\begin{array}{@{}rcl@{}} \vartheta_{t,\varepsilon }^{\star}&=&{\bar{\vartheta}}_{\tau _{\varepsilon}}+\frac{1}{t}{\int\nolimits}_{\tau_{\varepsilon} }^{t} \frac{\mathrm{d}X_{s}}{X_{s}}-\bar \vartheta_{\tau_{\varepsilon} }\frac{t-\tau_{\varepsilon} }{t}=\frac{1}{t}{\int\nolimits}_{\tau_{\varepsilon} }^{t} \frac{\mathrm{d}X_{s}}{X_{s}} +\bar \vartheta_{\tau_{\varepsilon} }\frac{\tau_{\varepsilon} }{t}\\ &=& {\hat{\vartheta}}_{t,\varepsilon }+\bar \vartheta_{\tau_{\varepsilon} }\frac{\tau_{\varepsilon} }{t}-\frac{1}{t} {\int\nolimits}_{0}^{\tau_{\varepsilon} }\frac{\mathrm{d}X_{s}}{X_{s}}= {\hat{\vartheta}}_{t,\varepsilon }+ o(\varepsilon). \end{array}$$
Hence, the estimators $$\vartheta _{t,\varepsilon }^{\star }$$ and $${\hat {\vartheta }}_{t,\varepsilon }$$ have the same limit distributions. Therefore the estimator-process
$$\begin{array}{@{}rcl@{}} Y_{t,\varepsilon }^{\star}=X_{t}^{\mu ({\bar{\vartheta}}_{\tau _{\varepsilon} })}\frac{e^{\lambda ({\bar{\vartheta}}_{\tau _{\varepsilon} }) \left(T-t\right) }}{\sqrt{2\pi \left(T-t\right)\varepsilon^{2}\sigma ^{2}}}{\int\nolimits}_{-\infty }^{\infty} e^{-\frac{\left(\ln {X_{t}}-z\right)^{2}}{2\varepsilon ^{2}\sigma^{2}\left(T-t\right)}-{\mu({\bar{\vartheta}}_{\tau_{\varepsilon}}) z}}\;\Phi \left(e^{z}\right)\mathrm{d}z. \end{array}$$

It is easy to see that $$Y_{t,\varepsilon }^{\star }\longrightarrow \Phi \left (X_{T}\right)$$ as tT. The expression for $$Z_{t,\varepsilon }^{\star }$$ can be written as well.

## Discussions

Note that we approximate the solution of the BSDE and not the equation itself. Of course, it is also possible to write the stochastic differential for $$Y_{t,\varepsilon }^{\star }$$. For simplicity of notation we consider the case k=1,d=1. Indeed, the process $$Y_{t,\varepsilon }^{\star }=u\left (t,X_{t},\vartheta _{t,\varepsilon }^{\star }\right)$$, where X t has stochastic differential (9) and $$\vartheta _{t,\varepsilon }^{\star }$$ given by (23) can be written as follows
$$\begin{array}{@{}rcl@{}} &&\vartheta_{t,\varepsilon }^{\star}={\bar{\vartheta}}_{\tau_{\varepsilon} }+\varepsilon \mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s})\sigma \left(s,X_{s}\right)^{-1}\mathrm{d}W_{s} \\ &&\quad\qquad +\mathbb{I}_{t}\left({\bar{\vartheta}}_{\tau_{\varepsilon} }\right)^{-1}{\int\nolimits}_{\tau _{\varepsilon} }^{t}\dot{\mathbb{S}}({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s})\mathbb{A}\left(s,X_{s}\right)^{-1}\left[ S(\vartheta_{0 },s,X_{s})-S({\bar{\vartheta}}_{\tau_{\varepsilon} },s,X_{s})\right]\mathrm{d}s \\ &&\quad\; \,\,\,={\bar{\vartheta}}_{\tau_{\varepsilon} }+\varepsilon I_{t}^{-1}{\int\nolimits}_{\tau_{\varepsilon} }^{t}\alpha_{s,\varepsilon }\mathrm{d}W_{s} +I_{t}^{-1}{\int\nolimits}_{\tau_{\varepsilon} }^{t}\beta_{s,\varepsilon }\mathrm{d}s \end{array}$$
(29)

with obvious notations. Therefore, the stochastic differential for $$Y_{t,\varepsilon }^{\star }$$ can be written (Itô formula).

It was shown that the right-hand side of (29) tends to a constant 𝜗 0 as ε→0 and we can verify that $$\mathrm {d}\vartheta _{t,\varepsilon }^{\star }\rightarrow 0$$.

More detailed analysis shows that
$$\begin{array}{@{}rcl@{}} \mathrm{d}Y_{t,\varepsilon }^{\star}=-f\left(t,X_{t},Y_{t},Z_{t}\right)\mathrm{d}t+Z_{t}\mathrm{d}W_{t}+\varepsilon \mathrm{d}\eta_{t}+o(\varepsilon),\quad \tau_{\varepsilon} \leq t\leq T, \end{array}$$

where the Gaussian process η t is defined in Theorem 2. We used the relation $$Y_{t,\varepsilon }^{\star }=Y_{t}+\varepsilon \eta _{t}+o(\varepsilon)$$.

The multi-step MLE-processes used in this work can be useful in similar problems of BSDE approximations for dicrete-time observations and ergodic diffusion models mentioned in the introduction (see (Abakirova, A and Kutoyants, YA: On approximation of the BSDE. Large samples approach. In preparation) and Gasparyan and Kutoyants (2015)).

## Declarations

### Acknowledgments

This work was done with partial financial support of the RSF grant number 14-49-10079.

### Competing interests

I declare that there is no competing interests.

### Authors’ contributions

I read and approved the final manuscript.

## Authors’ Affiliations

(1)
Laboratoire Manceau des Mathématiques, Université du Maine Le Mans, France and National Research University “MPEI”

## References

1. Bismut, JM: Conjugate convex functions in optimal stochastic control. J. Math. Anal. Appl. 44, 384–404 (1973).
2. El Karoui, N, Peng, S, Quenez, M: Backward stochastic differential equations in finance. Math. Fin. 7, 1–71 (1997).
3. Fisher, RA: Theory of statistical estimation. Proc. Cambridge Phylosophical Society. 22, 700–725 (1925).
4. Freidlin, MI, Wentzell, AD: Random Perturbations of Dynamical Systems. 2nd Ed. Springer, NY (1998).
5. Gasparyan, S, Kutoyants, YA: On approximation of the BSDE with unknown volatility in forward equation. Armenian J. Math. 7(1), 59–79 (2015).
6. Ibragimov, IA, Has’minskii, RZ: Statistical Estimation - Asymptotic Theory. Springer, New York (1981).
7. Jeganathan, P: Some asymptotic properties of risk functions when the limit of the experiment is mixed normal. Sankhya: The Indian Journal of Statistics. 45(Series A, Pt.1), 66–87 (1983).
8. Kamatani, K, Uchida, M: Hybrid multi-step estimators for stochastic differential equations based on sampled data. Statist. Inference Stoch. Processes. 18(2), 177–204 (2015).
9. Kutoyants, YA: Identification of Dynamical Systems with Small Noise. Kluwer Academic Publisher, Dordrecht (1994).
10. Kutoyants, YA: On approximation of the backward stochastic differential equation. Small noise, large samples and high frequency cases. Proceed. Steklov Inst. Mathematics. 287, 133–154 (2014).
11. Kutoyants, YA: On Multi-Step MLE-Process for Ergodic Diffusion. arXiv:1504.01869 [math.ST] (2015).
12. Kutoyants, YA, Motrunich, A: On milti-step MLE-process for Markov sequences. Metrika. 79(6), 705–724 (2016).
13. Kutoyants, YA, Zhou, L: On approximation of the backward stochastic differential equation. (arXiv:1305.3728). J. Stat. Plann. Infer. 150, 111–123 (2014).
14. Le Cam, L: On the asymptotic theory of estimation and testing hypotheses. In: Proc. 3rd Berkeley Symposium, vol. 1, pp. 129–156 (1956).
15. Lehmann, EL, Romano, JP: Testing Statistical Hypotheses. 3rd ed. Springer, NY (2005).
16. Liptser, R, Shiryaev, AN: Statistics of Random Processes. v.’s 1 and 2, 2-nd ed. Springer, NY (2001).
17. Pardoux, E, Peng, S: Adapted solution of a backward stochastic differential equation. System Control Letter. 14, 55–61 (1990).
18. Pardoux, E, Peng, S: Backward stochastic differential equations and quasilinear parabolic partial differential equations. Stochastic Partial Differential Equations and their Applications. Springer, Berlin (1992). (Lect. Notes Control Inf. Sci. 176).
19. Robinson, PM: The stochastic difference between econometric statistics. Econometrica. 56(3), 531–548 (1988).
20. Skorohod, AV, Khasminskii, RZ: On parameter estimation by indirect observations. Prob. Inform. Transm. 32, 58–68 (1996).Google Scholar
21. Uchida, M, Yoshida, N: Adaptive Bayes type estimators of ergodic diffusion processes from discrete observations. Statist. Inference Stoch. Processes. 17(2), 181–219 (2014).