Affine processes under parameter uncertainty

We develop a one-dimensional notion of affine processes under parameter uncertainty, which we call non-linear affine processes. This is done as follows: given a set of parameters for the process, we construct a corresponding non-linear expectation on the path space of continuous processes. By a general dynamic programming principle we link this non-linear expectation to a variational form of the Kolmogorov equation, where the generator of a single affine process is replaced by the supremum over all corresponding generators of affine processes with parameters in the parameter set. This non-linear affine process yields a tractable model for Knightian uncertainty, especially for modelling interest rates under ambiguity. We then develop an appropriate Ito-formula, the respective term-structure equations and study the non-linear versions of the Vasicek and the Cox-Ingersoll-Ross (CIR) model. Thereafter we introduce the non-linear Vasicek-CIR model. This model is particularly suitable for modelling interest rates when one does not want to restrict the state space a priori and hence the approach solves this modelling issue arising with negative interest rates.


Introduction
The modelling of a dynamic and unpredictable phenomenon like stock markets or interest rate markets is often approached via chosing an appropriate stochastic model. In many cases, the choice of the model is a delicate and difficult question. In complex dynamic environments like financial markets it is rather the rule than the exception that unforeseen events lead to difficulties with the a-priori chosen model and improvements of the model have to be developed and implemented.
A promiment example in this direction is the role of affine short-rate models in the last 20 years: while around 2000, the property of the Vasiček model that interest rates can become negative was heavily critizied and the non-negative Cox-Ingersoll-Ross (CIR) model was preferred, the consequences of the financial crises 2007-2008 leading to negative interest rates in the Euro zone rendered the CIR model no longer applicable and led to a renaissance of the Vasiček model. This example illustrates the important question of model uncertainty, which is one of the most important topics in applied sciences and in particular plays a prominent role in finance, not only since the financial crisis. The apparent risk of losses due to model mis-specification, called model risk fostered the development of strategies which are robust against model risk, typically leading to non-linear pricing rules. These robust strategies play a prominent role in the literature, see Denis and Martini (2006), Cont (2006), Eberlein et al. (2014), Madan (2016), Acciaio et al. (2016), Muhle-Karbe and Nutz (2018), Bielecki et al. (2018) and the book Guyon and Henry-Labordère (2013), to name just a few references in this direction.
A key observation in these works is that the single probability measure used in the classical approaches to specify a model has to be replaced by a family of probability measures (i.e. a full class of models). Such an approach is very natural from the statistical viewpoint: when a model has certain parameters to be estimated, the estimators carry statistical uncertainty Date: March 27, 2019. and one considers confidence intervals instead, corresponding to a family of probability measures. The latter formulation of model risk is typically referred to as parameter uncertainty, see Avellaneda et al. (1995), Wilmott and Oztukel (1998), Fouque and Ren (2014), and is a major motivation for our research.
Examples in this direction are the notions of g-Brownian motion and G-Brownian motion referring to a Brownian motion with drift or volatility uncertainty, see Peng (1997Peng ( , 2007a and references therein. Most recently, this theory has been extended to more general approaches, so-called non-linear Lévy processes, see Neufeld and Nutz (2017) and Denk et al. (2017) in this regard.
Here we generalize this notion to affine processes under parameter uncertainty (called non-linear affine processes). While a classical affine process corresponds to a single semimartingale law, we represent the affine process under parameter uncertainty by a family of semimartingale laws whose differential characteristics are bounded from above and below by affine functions of the current states. Nonlinear Lévy processes constitute the special case where the bounds do not depend on the state of the process. It seems important to stretch that for affine processes the bounds on drift and volatility are allowed to depend on the state of the process (in an affine way, however). On the contrary, this state dependence leads to a number of additional difficulties and we therefore restrict ourselves to the simplest case, namely the one-dimensional case without jumps.
It is our aim to provide the appropriate tools for incorporating parameter uncertainty in the prominent class of affine models. This naturally leads to a non-linear version of affine processes and associated non-linear expectations. After having established a dynamic programming principle, we establish the connection to the non-linear Kolmogorov equation. This allows us to study a number of interesting further steps, a non-linear version of the Itô-formula, and non-linear affine term structure equations.
We also provide a number of examples: besides a non-linear variant of the Black-Scholes model and non-linear Vasiček and Cox-Ingersoll-Ross (CIR) models we also introduce a nonlinear Vasiček-CIR model. In the latter model one can incorporate negative interest rates in combination with a CIR-like behaviour, solving the problem raised in many practical applications when the state space needed to be restricted to positive interest rates (see Carver (2012)).
The paper is organized as follows: in Section 2 we introduce non-linear affine processes. In Section 3 we prove that a dynamic programming principle holds. Section 4 provides the non-linear Kolmogorov equation and as examples the non-linear Vasiček model and the non-linear CIR model. In Section 5 we provide a non-linear Itô-formula together with some examples and in Section 6 we study (non-linear) affine term structure models. Section 7 studies the application to model risk and Section 8 concludes.

Setup
We begin with a short review of continuous affine processes in one dimension. For a detailed exposition we refer to Duffie et al. (2003) and Filipović (2009). Consider the canonical state space, which is either X = R or X = R >0 . A (time-homogeneous) Markov process X with values in the state space X is called affine if the conditional characteristic function of X is exponential affine. This means that there exist C-valued functions φ(t, u) and ψ(t, u), respectively, such that E[e uX T | X t ] = e φ(T −t,u)+ψ(T −t,u)Xt for all complex u ∈ {ix : x ∈ R}, 0 ≤ t ≤ T . The key for our non-linear formulation will be a characterization of X in terms of stochastic differential equations: more precisely, the affine process X is the unique strong solution of the stochastic differential equation where the drift parameter b 0 + b 1 X t and the diffusion parameter a 0 + a 1 X t depend on the current value of X in an affine way. Here, the process W is a standard Brownian motion. It should be noted that, depending on the state space, not all parameter combinations are possible, but only those combinations which are admissible in the sense made precise in Theorem 10.2 in Filipović (2009). For our case this implies that if on the one side X = R we necessarily have a 1 = 0 and a 0 > 0 and, on the other side, if X = R >0 we obtain a 0 = 0, a 1 > 0 and b 0 > 0. In addition, the coefficients φ and ψ solve ODEs (classified as Riccati equations) which is the essence for the high degree of tractability of affine processes in the sense that explicit calculations are possible or efficient numerical methods are obtainable; see Duffie et al. (2003), Filipović (2009) for details and applications in this regard.
2.1. Non-linear affine processes. In this section we introduce the necessary tools for defining affine processes under parameter uncertainty. To this end, fix a final time horizon T > 0 and let Ω = C([0, T ]) be the canonical space of continuous, one-dimensional paths. We endow Ω with the topology of uniform convergence and denote by F its Borel σ-field. Let X be the canonical process X t (ω) = ω t , and let F = (F t ) t≥0 with F t = σ(X s , 0 ≤ s ≤ t) be the (raw) filtration generated by X.
As we are interested in semimartingale laws on Ω we begin by denoting by P(Ω) the Polish space of all probability measures on Ω equipped with the topology of weak convergence 1 . The process X will be called a (continuous) P -F-semimartingale, for P ∈ P(Ω), if there exist processes B = B P and M = M P such that X = X 0 + B + M , where B has continuous paths of (locally) finite variation P -a.s., M is a continuous P -F-local martingale and B 0 = M 0 = 0.
It will be important in the following that, by Proposition (2.2) in Neufeld and Nutz (2014), X is a P -F-semimartingale if and only if it is a P -semimartingale with respect to the right-continuous filtration F + = (F t+ ) t≥0 or with respect to the usual augmentation F P + ; here F t+ = ∩ s>t F s . Hence, in the following we can consider semimartingales with respect to the raw filtration F.
The P -F-characteristics of a continuous semimartingale X = X 0 + B P + M P in the above representation is the pair (B P , C) where C = M P . The non-negative process C does not depend on P , as the quadratic variation is a path property 2 . For the following, we will focus on semimartingales where the semimartingale characteristics are absolutely continuous (a.c.), i.e. there exist predictable processes β P and α ≥ 0, such that A probability measure P ∈ P(Ω) is called a semimartingale law for X, if X is a P -Fsemimartingale. We denote by P ac sem = {P ∈ P(Ω) | X is a P -F-semimartingale with a.c. characteristics} the set of all semimartingale laws of X which have absolutely continuous characteristics. In the following we will always denote by (β P , α) as in (2) the differential characteristics of X under P ∈ P ac sem . Our main goal is to allow for a specific version of model risk in the sense that there is uncertainty on the parameter vector θ = (b 0 , b 1 , a 0 , a 1 ) of the affine process. We assume that there is additional information on bounds on the parameter vector θ and denote these finite bounds by b i ,b i , a i ,ā i , i = 0, 1, respectively. This leads to the compact set We are interested in the intervals generated by the associated affine functions. In this regard, let B : . Moreover, we denote for a ∈ R 2 , a := (a 0 , a 1 ) and similarly for b ∈ R 2 . Furthermore, let for x ∈ R denote the associated set-valued functions. As the state space will, in general, be R we have to ensure non-negativity of the quadratic variation which is achieved using (·) + := max{·, 0} in the definition of a * . Due to the nice structure of Θ the sets are always intervals: indeed, 1 The weak topology is the topology induced by the bounded continuous functions on Ω. Then, P(Ω) is a separable metric space and we denote the associated Borel σ-field by B(P(Ω)). 2 This is because C can be constructed as a single process not depending on P ; that is, two measures under which X has different diffusion are necessarily singular, see Proposition 6.6 in Neufeld and Nutz (2014), for the construction of C.
Clearly, it would be possible to consider more general Θ, which is, however, not our focus here.
An affine process can uniquely be characterized by its transition probabilities. We will make use of this fact for characterizing affine processes under model uncertainty. Moreover, we denote by O the considered state space, which will be either R, R ≥0 or R >0 .
Definition 2.2. Let Θ be a set as in (3) with associated a * , b * as in (4). A non-linear affine process starting at x ∈ O is the family of semimartingale laws P ∈ P ac sem , such that (i) P (X 0 = x) = 1, (ii) P is affine-dominated by Θ.
As explained in the introduction, parameter uncertainty is represented by a family of models replacing the one single model in the approaches without uncertainty: according to Definition 2.2, the affine process under parameter uncertainty is represented by a family of semimartingale laws instead of a single one. We denote the semimartingale laws P ∈ P ac sem , satisfying P (X 0 = x) = 1 and being affine-dominated by Θ by A(x, Θ). Intuitively, this corresponds to a non-linear affine process starting in x.
It is well-known that the state space O needs to be chosen in correspondence with the choice of Θ: indeed, the squared Bessel process is an affine process with state space R >0 (see, for example Karatzas and Shreve (1988), Prop. 3.22 of Ch. 3) and the set A(x, Θ) will be empty for x < 0. To exclude additional difficulties in this direction, we call a family of non-linear affine processes (A(x, Θ)) x∈O with state space O proper, if either a 0 > 0 holds, or a 0 =ā 0 = 0 and b 0 ≥ā 1 /2 > 0. It is clear that in the case with O = R the assumption a 0 > 0 is sufficient for reaching the full state space. The case with non-negative state spaces is more delicate. We concentrate on the case O = R >0 . The following proposition gives a sufficient condition in this regard. It moreover shows that the non-linear affine process does not reach zero, in the sense that the event of reaching 0 has zero probability under all P ∈ A(x, Θ). Proposition 2.3. Let x > 0, and assume that a 0 =ā 0 = 0 and b 0 ≥ā 1 /2 > 0. Then for any P ∈ A(x, Θ) it holds that P (X t > 0, 0 ≤ t ≤ T ) = 1.
Proof. Let P ∈ A(x, Θ), denote by β P and α the associated processes from Equation (2), and denote by M P the P -local martingale part of the P -semimartingale X. Moreover, for any c ≥ 0 define We need to show for any time T > 0 that P [τ 0 ≤ T ] = 0. To that end, fix an arbitrary T > 0. We adopt the method in Gikhman (2011) to our setting: let ε such that 0 < ε < x. Notice that by continuity of the paths of X, we have that X ≥ ε > 0 on [[0, τ ε ]]. Moreover, by assumption, Itô's formula yields for any t ≥ 0 that Clearly, both M andM are local martingales and M = −mM on 0, τ ε . We now show that M is a true martingale. By the Burkholder-Davis-Gundy inequality, since X ≥ ε on [[0, τ ε ]], we obtain that and henceM is indeed a martingale. Therefore, M is a true martingale on 0, τ ε . Next, since P ∈ A(x, Θ) and m > 0, we obtain the estimate Taking expectations and using that X ≥ ε > 0 on [[0, τ ε ]] yields, in view of (7), for all t ≥ 0 that Moreover, as X ≥ ε > 0 on [[0, τ ε ]], we obtain that This together with (8) yields for any t ≥ 0 that By means of Gronwall's inequality we obtain for all t ≥ 0 that In particular, we obtain by Tschebyscheffs inequality for our fixed time T > 0 that Inserting (9) and letting ε tend to zero yields the claim. The proof of Proposition 2.3 is thus complete.

Dynamic programming
One of the key insights for Markov processes is the deep link between Markov processes and their expectations to partial differential equations given by the Kolmogorov equation. In this section we generalize this relation to the case with parameter uncertainty, i.e. we develop the relation of the non-linear affine process to a non-linear version of the Kolmogorov equation. The path we detail in this section uses dynamic programming and the results obtained in Nutz and van Handel (2013) and El Karoui and Tan (2013a). The key to dynamic programming is a certain stability property under conditioning and pasting. As the non-linear affine processes we considered up to now always start from time t = 0, we introduce the appropriate conditional formulations first.
For the remainder of the section, fix Θ as in (3) with associated a * and b * as in (4). Denote by A(t, x, Θ) the semimartingale laws P ∈ P ac sem , such that (i) P (X t = x) = 1, and (ii) P is affine-dominated on (t, T ] by Θ. The following result yields measurability of the non-linear affine process starting at t in ω(t).
Proof. By Theorem 2.6 in Neufeld and Nutz (2014), the set P ac sem is Borel, which proves that (ω, t, P ) ∈ Ω × [0, T ] × P ac sem P (X t = ω(t)) = 1 is Borel. Moreover, Theorem 2.6 in Neufeld and Nutz (2014), also grants the existence of a Borel-measurable map (P, ω, s) → (β P s ( ω), α s ( ω)) such that (β P , α) are the differential characteristics of X under P . Therefore, we obtain the Borel measurability of the set Applying Fubini's theorem yields the Borel measurability of the set (ω, t, P, ·)] = 1 . and the right hand side is Borel measurable due to a monotone class argument as in (Neufeld and Nutz, 2014, Lemma 3.1).
Proof. This follows directly from (Neufeld and Nutz, 2017, Theorem 2.1), see also (Guo et al., 2017, Lemma 4.6). Now, fix a Borel measurable function ψ : O → R and define the value function v : Using the above results, we obtain the following dynamic programming principle.
Proof. The result follows from Theorem 2.1 in El Karoui and Tan (2013b) (with P t,x = A(t, x, Θ) in the notation of El Karoui and Tan (2013b)) by noting that analyticity is implied by measurability as shown in Lemma 3.1 and the required stability assumptions have been shown in Lemma 3.2.
3.1. Continuity of the value function. In the following, we show the continuity of the value function v(t, x). To this end, introduce the constant which is finite by Assumption (3). The following inequality is the cornerstone of the results in this section.
Lemma 3.4. Consider a proper family of non-linear affine processes with state space O and let q ≥ 1. There exists an 0 < ε ≡ ε(q) < 1 such that for all for some constant C = C(x, q) > 0 which may depend on x, but is independent of h and t.
Proof. Let q ∈ [1, ∞). Consider P ∈ A(t, x, Θ) and denote by X s = x + B P s + M P s , s ≥ t, the semimartingale representation of X with predictable finite-variation part B P and local martingale M P . In the following we will repeatedly use the elementary inequality that and denote c q := 2 q−1 . The Burkholder-Davis-Gundy (BDG) inequality (see Theorem 4.1 in Revuz and Yor (1999)) together with Jensen's inequality and (11) Note that the constant C q ≥ 1 from the BDG inequality does depend on q only. Let K := 1 + K, where K is the constant defined in (10). Choose any 0 < ε = ε(q) < 1 small enough such that it satisfies Let us verify that such a fixed ε satisfies the desired property: by the very definition of P ∈ A(t, x, Θ), we have on [t, t + h] that both α and |β P | are bounded from above by K + K sup 0≤s≤h |X t+s | ≥ 1 since they are affine dominated. This, together with Jensen's inequality yields that In a similar way we obtain Inserting these inequalities into (12), considering h ≤ ε and noting thatC q ≥ 1 implies that Since h ≤ ε and we chose 0 < ε < 1 such that (13) holds, we obtain for the constant As P ∈ A(t, x, Θ) was chosen arbitrarily, the claim is proven.
Remark 3.5. The proof of Lemma 3.4, actually shows that for the corresponding 0 < ε < 1, and all x ∈ R, that the local martingale part (M P t+s ) 0≤s≤h restricted on [t, t + h] is a true martingale for any P ∈ A(t, x, Θ). Lemma 3.6. Consider a proper family of non-linear affine processes with state space O and let ψ : is jointly continuous. In particular, v(t, ·) is Lipschitz continuous with constant L ψ and v(·, x) is locally 1/2-Hölder continuous.
Proof. For the Lipschitz-continuity of v(t, ·), observe that for any t For the locally 1/2-Hölder continuity, let t ∈ [0, T ) and 0 ≤ u ≤ T −t be small enough. Then the dynamic programming principle derived in Proposition 3.3, the Lipschitz continuity of v(t, ·) and Lemma 3.4 yield Letting n go to infinity yields the result.

The Kolmogorov equation
In this section we provide the link between the non-linear affine process and the associated non-linear Kolmogorov equation. More precisely, we relate the non-linear affine process to a (fully) non-linear partial differential equation (PDE). This is achieved by a probabilistic construction involving an optimal control problem on the canonical space of continuous paths where the controls are laws of affine-dominated semimartingales. To this end, note that the affine process X given in Equation (1) is uniquely characterized by its infinitesimal generator, This solving the martingale problem, see, e.g., Theorem 21.7 in Kallenberg (2002).
Consider the state space O which will be either R, R ≥0 or R >0 . Fix ψ : O → R and consider the fully non-linear PDE where The function −G satisfies the degenerate ellipticity condition and as Θ is compact, it is also continuous. Observe that the PDE defined in (16) can be seen as non-linear affine PDE, since for θ :  (16) . The definition of a viscosity supersolution is obtained by reversing the inequalities and the semicontinuity. Finally, a continuous function is a viscosity solution if it is both sub-and supersolution.
We obtain a stochastic representation for the non-linear affine PDE (16).
is a viscosity solution of the non-linear PDE in (16).
Proof. The proof essentially follows the well-known standard arguments in stochastic control, see e.g. the proof of (Neufeld and Nutz, 2017, Proposition 5.4). By Lemma 3.6, v(t, x) is continuous on [0, T ) × R, and we have v(T, x) = ψ(x) by the definition of v. We show that v is a viscosity subsolution of the non-linear affine PDE defined in (16); the supersolution property is proved similarly. We remark that in the subsequent lines within this proof, C > 0 is a constant whose values may change from line to line.
. By the dynamic programming principle obtained in Proposition 3.3 we have for any Fix any P ∈ A(t, x, Θ), denote as above by (β P , α) the differential characteristics of the continuous semimartingale X under P , and denote by M P the P -local martingale part of the P -semimartingale X. Then, Itô's formula yields As ϕ ∈ C 2,3 b ([0, T )×R), ∂ x ϕ is uniformly bounded, thus by Remark 3.5, we see that for small enough 0 < u < T − t the local martingale part in (19) is in fact a true martingale, starting at 0. In particular, its expectation vanishes. The next step is to estimate the expectation of the other terms. In this regard, note that Since ϕ ∈ C 2,3 b , ∂ x ϕ is Lipschitz. Hence we obtain with the constant K from Equation (10) together with Lemma 3.4 that for small enough u, Inserting (21) into (20) yields The same argument applied to ∂ xx ϕ leads to Moreover, by a similar calculation, we have As above we write θ := (b 0 , b 1 , a 0 , a 1 ) for an element in Θ. Then, by taking expectations in (19) and using (20)-(24) yields Here the supremum turns out to be G(X t+s , ∂ x ϕ(t, x), ∂ xx ϕ(t, x)). Note that by the very definition of G, Therefore, by using that ϕ ∈ C 2,3 b , the definition of the constant K in (10) and Lemma 3.4, we have Combining (25)-(26) yields for some constant C > 0 which is independent of P . As the choice of P ∈ A(t, x, Θ) was arbitrary, we deduce from (18) By dividing first in (28) by −u and then let u go to zero, we obtain that which proves that v is indeed a viscosity subsolution as desired.
4.1. Uniqueness. Uniqueness in our framework is not covered by standard arguments as in Fleming and Soner (2006) or Crandall et al. (1992), since the diffusion coefficient does not satisfy a global Lipschitz condition. This is a well-known difficulty already discussed in Feller (1951). To overcome this, we have to distinguish the two cases where the state space is either R or R >0 . We begin with the general case which covers the non-linear Vasiček-CIR model discussed in detail in Section 6.1. In this case we do not decide a priori whether a Vasiček or a Cox-Ingersoll-Ross (CIR) model takes place and therefore consider the full space R as state space. To achieve this, we assume throughout a non-vanishing volatility, i.e. a 0 > 0. When the process reaches zero from above, this avoids that a deterministic behaviour on R <0 takes place and that singularities arise. On the other side, we do not need any assumptions on a 1 . Proof. Uniqueness in this case follows by the observation that the coefficients are Lipschitz once a 0 > 0. Indeed, then following the standard procedure detailed in Section V.9 in Fleming and Soner (2006) allows to extend the uniqueness results from Corollary V.8.1 therein to an unbounded domain as considered here.
On the other side if we consider the case where R >0 is the state space, we will necessarily require a 0 =ā 0 = 0. Then, the (bounds on the) diffusion coefficient do no longer satisfy a global Lipschitz property and the standard methodology can not be applied. To the best of our knowledge, only Costantini et al. (2012) and Amadori (2007) treat this setting while we will apply here the techniques from the second article.
To begin with, note that existence of a solution under the Lipschitz property of ψ follows from Theorem 4.1. Furthermore, Theorem 4 in Amadori (2007) yields the desired uniqueness in our case. We recall this result together with its assumptions in Appendix B. The claim the follows from Theorem B.1.
Remark 4.4. Already the results in the case without uncertainty in Costantini et al. (2012) show that these results do not generalize to R ≥0 , because then the Lipschitz property on compact subsets which is crucially used in the proof will no longer be satisfied. Moreover, Example 1 in Amadori (2007) shows that the condition b 0 ≥ā 1 /2 > 0 is indeed necessary for uniqueness.
It is time to accommodate the established results with some first motivating examples. We will begin with classical affine processes under parameter uncertainty, leading to nonlinear affine processes. Thereafter we show how to extend beyond this case and introduce classes of non-linear affine processes which do not have a classical counterpart.

4.2.
The non-linear Vasiček model. The first example of an affine model is the socalled Vasiček model, see for example Filipović (2009). It is a Gaussian Ornstein-Uhlenbeck process and is obtained as the strong solution of the SDE in Equation (1) by considering a 1 = 0. Introducing parameter uncertainty we arrive at the non-linear affine process with [a 1 ,ā 1 ] = {0}. We call this case the non-linear Vasiček model.
While in the case with no parameter uncertainty, this model can be characterized efficiently by its Fourier transforms and the associated Riccati equation, this is no longer possible here (except for the special case where b 1 =b 1 , see Remark 4.7 below) and one has to rely on numerical techniques. In one dimension, this does not at all pose a problem, see for example Heider (2010). To illustrate this, we solve the equation for the simplest pay-off, f (x) = e x . The result is shown in Figure 1.
This example combines and encompasses the following two well-known non-linear processes: g-Brownian motion and G-Brownian motion, see Peng (1997) and Peng (2007a), for example, and Neufeld and Nutz (2017) for the case with jumps. In the following we will also elaborate on the case where explicit solutions can be obtained, see Proposition 4.6 and Remark 4.7.
4.3. The non-linear CIR model. While the Gaussianity of the Vasiček model immediately implies that the process becomes negative with positive probability, this is inappropriate for various applications, e.g. in credit risk. There, the considered affine process models an intensity, which by definition has to be non-negative. Also positive interest rates were a must of a model before the recent crises in 2008-2010. The Cox-Ingersoll-Ross (CIR) model serves as an affine model with state space R ≥0 and therefore satisfies these needs. It is obtained by choosing a 0 = 0 and a 1 > 0 in the SDE in (1).
The CIR model under parameter uncertainty, which we call the non-linear CIR model, is obtained by considering the state space O = R >0 and assuming that a 0 =ā 0 = 0, b 0 ≥ā 1 /2 > 0. It is remarkable how far-reaching the positivity of X will be: indeed, a first look at the non-linear Kolmogorov equation already reveals that for increasing and convex functions the supremum in Equation (17)  from this problem and we show in the following that in important special cases it can be computed explicitly. However, inversion techniques can no longer be applied in the nonlinear setting and the Laplace transform merely serves as a prime example of an increasing convex function for which the non-linear expectations can be computed explicitly in special cases (and for decreasing concave functions of course).
Remark 4.5. In principle, the non-linear CIR model can be extended to the whole space, i.e. choosing O = R is possible, since in Equation (4), negative values pose no problem. In the negative halfline the dynamics of the model have no diffusive part and only moves through the drift (which of course can still be stochastic). This could happen with a positive (upper) mean-reversion level and by starting from a negative value.
We begin by a general result for affine processes which classifies when the classical representation via Riccati equations still holds.
Proposition 4.6. Consider a non-linear affine process A(x, Θ) and assume that for all P ∈ A(x, Θ) dP ⊗ dt-almost everywhere for 0 ≤ t ≤ T . Moreover, assume either that a 1 =ā 1 = 0 or that for all P ∈ A(x, Θ), X t ≥ 0 P ⊗ dt-a.e. If there exists aP ∈ A(x, Θ) and aP -F-Brownian motion W such that the canonical process underP is the unique strong solution of then, for all u ≥ 0 and 0 ≤ t ≤ T , where φ, ψ solve the Riccati equations This result is obtained by showing that the supremum in the non-linear expectation is obtained by the maximal semimartingale lawP which corresponds to an affine process with parametersā 0 ,ā 1 ,b 0 ,B 1,x . Inspection of the proof shows that this property also holds when e ux is replaced by any other increasing and convex function (a Call payoff, for example). An analogous formulation for u < 0 of course also holds. Furthermore, it is interesting to see that the Riccati equations in Equations (37) and (38) can be replaced by respective versions thereof.
Proof. The claim follows by an application of Theorem 2.2 in Bergenthum and Rüschendorf (2007). This needs validity of the so-called propagation of order (PO) property and we give a detailed account of this in Appendix A. Proposition A.4 in particular yields that the PO property is satisfied for increasing and convex (decreasing and concave) functions when compared to an affine process.
Let P ∈ A(x, Θ). By assumption there exists aP ∈ A(x, Θ) and aP -F-Brownian motion W , such that X is the unique strong solution of the stochastic differential equation (36) underP . Since e ux for u ≥ 0 is increasing and convex and (35) holds, Theorem 2.2 in Bergenthum and Rüschendorf (2007) yields that E P e uXt ] ≤ EP e uXt .
Since P ∈ A(x, Θ) was arbitrary and, underP , X is an affine process with parameters a 0 ,ā 1 ,b 0 ,B 1,x , the affine representation follows and the Riccati equations in (37) can be obtained directly from Theorem 10.1 in Filipović (2009).
Remark 4.7. It can be easily checked that the conditions for Proposition 4.6 hold in the following two cases: (1) The non-linear Vasiček model with state space O = R andb 1 = b 1 , (2) the non-linear CIR model on the state space O = R >0 .
By the above result we can use the classical Fourier-inversion technique for these affine processes when pricing increasing and convex payoffs (like Call options) or decreasing and concave ones, see Section 10.3 in Filipović (2009) for examples and details in this direction. In Example 5.6 we will sketch an application in a Heston model with parameter uncertainty in the stochastic volatility.

An Itô-formula for non-linear affine processes
In this section we will construct new processes from non-linear affine processes by simple transformations. The main tool for this will be a suitable formulation of the Itô-formula in our setting.
Consider a twice continuously differentiable function F ∈ C 2 (R). If we start from a nonlinear affine process A(x, Θ) and considerX := F (X) then for any P ∈ A(x, Θ) the process X is a P -semimartingale and we denote its (differential) semimartingale characteristics bỹ α andβ P (starting from α and β P from Equality (2)). In this section we answer the question if the non-linear processX itself, i.e. the associated semimartingale laws can be studied independently of X. This corresponds one-to-one to the question if there exist an independent formulation of the non-linear processX. The following proposition gives a positive answer to this question.
We define the interval-valued functions a F and b F by The non-linear processX inherits certain bounds from X which is characterized in the following proposition.
Proposition 5.1. Let A(x, Θ) be a non-linear affine process and F ∈ C 2 . Then, for every P ∈ A(x, Θ),X = F (X) is a P -semimartingale with differential characteristicsα andβ P satisfyingα Proof. Let t ∈ [0, T ]. By definition, P ∈ A(x, Θ) implies that , α s ∈ a * (X s ) and M P is the continuous local martingale part in the P -semimartingale decomposition of X. As previously, we denote M P = · 0 α s ds. Since F ∈ C 2 (R), the Itô formula yields that Now the propertiesα s ∈ a F (X s ) andβ P s ∈ b F (X s ) can be checked directly. Remark 5.2. Intuitively, the above result allows to construct the non-linear processX = F (X) when X is non-linear affine. The new bounds for the (differential) semimartingale characteristics are given by b F (X) and a F (X), respectively. However, the drift and volatility ofX now relate to each other, which often gives a substantially smaller class in comparison to all semimartingale laws whose drift and volatility stay in b F (X t ) and a F (X t ).
In general, (non-linear) affine processes are stable under affine transformation. The following example shows that, we may even consider the non-linear transformation F (x) = x 2 , at least in some special cases.
Example 5.3. Let A(x, Θ) be a non-linear Vasiček model satisfyingb 0 = b 0 = 0, and X = F (X) = X 2 . We apply Proposition 5.1: first, note that since F = 2 > 0, and a F (x) = [4x 2 a 0 , 4x 2ā0 ]. Then, b F and a F can even be written as functions ofX = X 2 . This would not be the case ifb 0 = b 0 = 0 does not hold, since b F would depend on x (which is not a function of x 2 ). Under this observation we may directly study the semimartingale characteristics ofX. Replacing x 2 byx in b F and a F we indeed observe an affine structure and it is tempting to conjecture that we obtained a non-linear CIR model. In general, this is not the case: for simplicity, choose a 0 = b 1 = 0 andā 0 =b 1 = 1 and x = 1. Then b F (1) = [0, 3] and a F (1) = [0, 4]. For a non-linear CIR model, any choices of (β,α) in b F × a F should be possible. Now choose, say,α = 4 (corresponding to a maximar volatility of α = 1 in the original model). Then not all choices ofβ ∈ [0, 3] are reached by the original model: indeed, one immediately obtains from (43) thatβ needs to lie in [1,3].
In the choice where only one parameter (either α or β) carries uncertainty, this problem of course vanishes. This is the case for the existing transformations of g-and G−Brownian motion in the literature and we provide further examples in this direction below.
The above example also illustrates, that non-linear transformations of processes under ambiguity should be handled with care. The following example shows how to obtain a geometric kind of dynamics, which allows us to obtain the non-linear Black-Scholes model as considered in (Epstein and Ji, 2013, Example 3) and Vorbrink (2014). Both works consider the case where there is only volatility uncertainty.
Example 5.4. Let A(x, Θ) be a non-linear affine process and consider F (X) = e X . Again, we apply Proposition 5.1. First, note that withx = e x , a F (x) = (e x ) 2 a * (x).
Moreover, since a * (x) = [a 0 + a 1 x + ,ā 0 +ā 1 x + ], we obtain and we already computedã. In a similar manner, one obtainsb from (40) noting that The state space of e X is of course R >0 .
Example 5.5 (The non-linear Black-Scholes model). Allowing for drift and volatility uncertainty in the log-price of a stock, one arrives at a non-linear Black-Scholes model. We consider a Brownian motion with drift and volatility uncertainty, which is in our language a non-linear Vasiček model with b 1 =b 1 = 0. Furthermore, we assume that the stock price is given by S = exp(X), i.e. F (x) = e x . Then, the calculations from the previous example immediately yield that the stock price is given by the non-linear processX where andb (x) = [xb 0 + 1 2 xa 0 , xb 0 + 1 2 xā 0 ]. Option pricing for monotone convex (concave) pay-offs can immediately be done by Proposition 4.6, see Example 3 in Epstein and Ji (2013) for explicit formulae for call options (with no uncertainty of the drift). The article Vorbrink (2014) excludes drift uncertainty by arguing that under risk-neutral pricing the drift is known.
Example 5.6 (The Heston model with uncertainty in the volatility parameters). The model put forward in Heston (1993) is one of the most popular models for stochastic volatility, which also is heavily used in foreign exchange markets. Model and calibration risk is an important issue, see for example Guillaume and Schoutens (2012) in this regard. Here we give a short outline how a non-linear version could be constructed, allowing for parameter uncertainty in volatility only (and not in the drift of the stock price or in the correlation of volatility and stock price). In this regard, we extend Ω in the classical way to construct an additional (independent) Brownian motionṼ which allows us to construct two correlated Brownian motions V and W . The correlation is fixed and denoted by ρ. Each P ∈ P(Ω) is extended by leavingṼ untouched, such that (V, W ) will be a two-dimensional Brownian motion where V and W have correlation ρ and we denote this new semimartingale law again by P .
Consider a non-linear CIR process A(x, Θ) with state space O = R >0 as introduced in Section 4.3. The stock price S is given by the strong solution of the SDE where X is the canonical process on Ω (and hence a non-linear process). Hence the volatility X stems from a non-linear CIR model which means intuitively, that we have a CIR model with parameter uncertainty with upper and lower boundsb 0 ,b 1 ,ā 1 and b 0 , b 1 , a 1 , respectively. For simplicity we chose a vanishing risk-free rate of interest. We show how to compute a call-price in this non-linear Heston model in the following. The call price C(T, K) for maturity T ≤ T * and strike K > 0 is given by the supremum of the expectations E P [(S T − K) + ] over all (extended) semimartingale laws P from A(x, Θ).
Since the pay-off function (s − K) + is increasing and convex, the arguments of Proposition 4.6 apply and C(T, K) = EP [(S T − K + )], whereP is the worst-case semimartingale law which achieves the supremum. Again from the proof of Proposition 4.6 we find that underP , X is a (classical) CIR-process with parameters b 0 ,b 1 ,ā 1 . The call price formula can be found in Heston (1993), see also Section 10.3.3. in Filipović (2009) for a derivation using Fourier inversion techniques.

Affine term structure models
One of the most important application of affine models is in term structure models. In this regard, we provide in the following a term-structure equation for non-linear affine models implying prices for derivatives or bond-prices.
Consider a payoff f (X T ) taking place at time T > 0. In the classical setting, arbitragefree prices are given by expectations of the discounted pay-off under a risk-neutral measure. According to the superhedging duality in Biagini et al. (2017) (Theorem 5.1), in the case we consider here -when there is a family of such measures -upper bounds of these price processes (and hence the smallest superhedging price) given X t = x are given by The following result states the non-linear term-structure equation for the pay-off f (X T ).
For a proof of this result one can argue the same way as in the proof of Theorem 4.1. More precisely, dynamic programming yields for any stopping time τ taking values in [t, T ] that Then, following the arguments in Theorem 4.1 leads to the desired result. Alternatively, one could also enlarge the state space to transform this control problem which is in Lagrange form to one of the Mayer form like in Proposition 3.3 and Theorem 4.1 (see for example Remark 3.10 in Bouchard and Touzi (2011)). The term-structure equation now allows to obtain the bond prices by considering the pay-off f (X T ) = 1. We illustrate how an extension of the state space can be used to achieve a result similar to Proposition 4.6 leading to closed-form bond prices in special cases.
In our approach, upper bond prices under the non-linear affine term structure model A(t, x, Θ), x ∈ O, are given bȳ conditional on X t = x. For arbitrary ω ∈ Ω one obtains the bond price asp(t, T )(ω) = p(t, T, ω t ). The respective lower bond price p(t, T, x) is obtained by replacing the supremum with an infimum. The following proposition shows that in important special cases these prices can be obtained in closed form. Again, for x ∈ O recall from (33) and (34)  and recallB 1,x = b 1 1 {x<0} +b 1 1 {x≥0} , so we obtain thatb(x) =b 0 +B 1,x x. Moreover, recall from (36) the affine process with coefficientsā 0 ,ā 1 ,b 0 ,B 1,x .
Proposition 6.2. Consider a non-linear affine process A(x, Θ), assume for all P ∈ A(x, Θ) that dP ⊗ dt-almost everywhere for 0 ≤ t ≤ T and assume that either a 1 =ā 1 = 0 or that for all P ∈ A(x, Θ), X t ≥ 0 P ⊗ dt-a.s. If there existsP ∈ A(x, Θ) and aP -F-Brownian motion W such that the canonical process underP is the unique strong solution of (36), then, for all u ≥ 0 and 0 ≤ t ≤ T ,p where φ, ψ solve the Riccati equations Proof. This results also follows by using semimartingale comparison. In this regard let P ∈ A(x, Θ) and consider the two-dimensional process Y = (Y 1 , Y 2 ) where Y 1 = − · 0 X s ds and Y 2 = X. Then, there is no parameter uncertainty with respect to the dynamics of Y 1 since its differential semimartingale characteristics are obtained from dY 1 t = −X t dt. By assumption there exists aP ∈ A(x, Θ) and aP -F-Brownian motion W , such that X is the unique strong solution of the stochastic differential equation (36). Denote by β, α the differential semimartingale characteristics under P of the two-dimensional semimartingale Y and byβ,ᾱ those of Y underP . It is easily verified that β t ≤β t and also that α t ≤ psdᾱt in the positive semidefinite order 3 .
Since (y 1 , y 2 ) → e u 1 y 1 for u ≥ 0 is increasing and convex, Theorem 2.2 in Bergenthum and Rüschendorf (2007) (the propagation of order (PO) property is shown in Appendix A) yields that, E P e − t 0 Xsds ] ≤ EP e − t 0 Xsds . Since, P ∈ A(x, Θ) was chosen arbitrarily and underP , X is an affine process with parametersā 0 ,ā 1 ,b 0 ,B 1,x , the affine representation follows and the Riccati equations in (50)-(51) can be obtained directly from Theorem 10.4 in Filipović (2009). Remark 6.3. Again, as in Remark 4.7, it can be easily checked that the conditions for Proposition 6.2 hold in the following two cases: (1) The non-linear Vasiček model with state space O = R andb 1 = b 1 , (2) the non-linear CIR model on the state space O = R >0 . It is remarkable, that in the general non-linear Vasiček model with parameter uncertainty on the speed of mean reversion, the classical exponential affine bond pricing formula ceases to hold. Thus, in this model, the interval of parameters can not directly be backed out from a interpolation of bid and ask prices with a standard Vasiček model. This however is the case in the CIR model and in the Vasiček model where b 1 is known.
The previous results and examples directly allow the treatment of term-structure models based on the non-linear Vasiček model and on the non-linear CIR model (see Sections 4.2 and 4.3). An important difficulty for the modeller in practical situations is that she has to make her choice between these models with strong implications: for example, the state space can allow for negative values or can strictly exclude them, a well-documented difficulty of the affine models after the European crisis, see Carver (2012). The following example shows that in a non-linear setting one is able to mix these two type of models and the modeller no longer has to decide a priori if she allows or excludes negative values.
6.1. The non-linear Vasiček-CIR model. If one is not able to restrict the state space a priori, one can consider the following non-linear affine model: assume that both parameters a 0 and a 1 are subject to parameter uncertainty (or at least one of them with the other parameter not vanishing). Intuitively, this means that the model may switch between a Vasiček-like or CIR-like behaviour. In particular, when one is not able to restrict the state space a priori to R ≥0 , this non-linear model allows to incorporate both model approaches in a robust (i.e. non-linear) sense.
In the interest rate markets in the early years after 2000, market participants believed in positive interest rates and thus favoured the CIR-model. The credit crises led to decreasing interest rates and the Vasiček model came back, as it allows for negative interest rates. This effect is well-known and its implications for banking are quite important, see for example Carver (2012), Patel et al. (2017), Orlando et al. (2016), Russo and Fabozzi (2017). In the near future, however, when interest rates may rise again one could be interested in deviating again from the Vasiček-model. With the non-linear Vasiček-CIR model such a switch is no longer necessary and one is able to behave consistently through such seemingly different time periods.
More precisely, assume that a 0 > 0 and consider the state space O = R. Uniqueness for the non-linear PDE (16) follows readily from Proposition 4.2. In this general case, there will be no explicit solutions like for example in Proposition 6.2 above and we have to rely on numerical techniques.  Vasiček, the non-linear CIR and the non-linear Vasiček-CIR model. The first two models are obtained from the latter by simply lettingā 1 = 0 (ā 0 = 0, respectively). The Call price has strike 0.5 and the parameters are given in the table above.

Model risk
In financial applications, model risk is an important factor for risk management. In the remarkable work Cont (2006), a systematic framework for the management of model risk has been proposed which we recall shortly and thereafter apply to the non-linear affine models. The importance of this topic is illustrated by the intensive research in this area, see for example Bannör and Scherer (2013), Guillaume and Schoutens (2012), Breuer and Csiszár (2016), Barrieu and Scandolo (2015), da Fonseca and Grasselli (2011) among many others.
In this approach, the market contains a number of benchmark instruments which are liquidly traded instruments and the observation consists in bid and ask prices thereof. Moreover, there is a set of arbitrage-free pricing models Q which is consistent with the observations of the benchmark instruments.
In our framework, both can be described through a non-linear affine model: the non-linear affine model specifies a set Q of pricing measures, as for example in the non-linear affine term-structure approach studied in Section 6. Consistent bid and ask prices can be obtained by suprema and infima over these pricing measures, exactly as it was done for p(t, T ) and p(t, T ) in the previous section.
A coherent measure of model uncertainty for a payoff function ψ : R → R with ψ(X T ) denoting the payoff at time T can be computed from the upper and lower price bounds The measure of model uncertainty on the derivative ψ is given by Some examples are provided in Cont (2006), including the non-linear Black-Scholes model. We illustrate the application of non-linear affine models in this framework with a short study of model risk in the non-linear Vasiček model. 7.1. The non-linear Vasiček model. As an example we consider the non-linear Vasičekmodel introduced in Section 4.2. Recall that this model is characterized by the assumption that [a 1 ,ā 1 ] = {0}. In the non-linear case, non-linear expectations are given by the solution of the non-linear Kolmogorov equation (15). Proposition 4.6 allows to trace this solution back to existing solutions for affine models if the payoffs are increasing and convex (decreasing and concave), see Example 5.6. For more general payoffs we rely on numerical methods which we illustrate now. 7.1.1. Options. The model risk for options in this model is illustrated in the following two pictures, where we price a call and a butterfly. To construct the set Θ, we take estimated parameter values together with their 95% confidence intervals from the literature. The results is shown in Figure 3. While for the Call option the model risk increases monotonically with the initial value, the maximal model risk for the butterfly is attained for the initial value x directly at the maximal payoff.

Conclusion
In this paper we introduced affine processes under uncertainty. This extends the existing class of non-linear Lévy processes to Markov processes where the interval for the parameter uncertainty may depend on the current state (in an affine way, however). We obtained a dynamic programming principle implying a non-linear Kolmogorov equation which can be used to price options in a fast and efficient way. Many existing models can be embedded into a setting with parameter uncertainty which we illustrate with a number of examples. However, the non-linear framework also allows for new model variations which did not exist in the classical approach and we illustrate this with a term-structure Vasiček-CIR model, where the modeler does not need to decide a priori if the state space should include negative rates or not, a strong restriction in existing models. The generalization to higher dimensions or to the case with jumps is left for future research. Here, we concentrated on the conceptual introduction of state-dependent parameter uncertainty and chose the simplest but still highly interesting example for the illustration of our ideas.   (2007) on comparison of semimartingales with Markov processes and show that the crucial propagation of order (PO) property is satisfied for a large class of Markov processes, in particular for affine processes. To the best of our knowledge, existing results in the literature require Lipschitz assumptions on the coefficient, which will not hold in our case.
As function class F we will consider increasing and convex functions. For a real-valued Markov process S * and a terminal time T we define the propagation operator . Assumption A.1. For some function class F and some Markov process S * we say that PO(S * , F) holds if G g (t, ·) ∈ F for all 0 ≤ t ≤ T and for all g ∈ F.
Propagation of monotonicity and convexity follows in a very elegant way through total positivity of the transition densities of continuous Markov processes.
Proposition 3.1 in Kijima (2002) immediately yields the following result.
Proposition A.2. Assume that S * is a strong Markov process having continuous sample paths and that g is increasing (decreasing). Then G g (0, ·) is increasing (decreasing).
For the propagation of convexity we need an additional step, because the considered processes in Kijima (2002) are in fact martingales. The proof crucially uses the variationdiminishing property of totally positive functions. Proposition A.3. Assume that S * is a strong Markov process with state space S having continuous sample paths and that there exist π 0 , π 1 ∈ R with π 1 = 0, such that Then for convex (concave) functions g it holds that G g (0, ·) is convex (concave).
Proof. We modify the first step in Proposition 3.2 in Kijima (2002). In this regard, note that Equation (53) yields that Moreover, we follow the notation in Kijima (2002) and denote by q T (x, y) the transition density of the Markov process S * , i.e.
Consider in addition a different Markov process S, possibly on a different probability space and denote by F icx := {f : R → R, increasing and convex}.
A combination of Proposition A.2 with A.3 yields the following result.
Proposition A.4. Let S * be a strong and homogeneous Markov process with continuous sample paths and that holds with π 1 (t) = 0 for all t ∈ [0, T ]. Then PO(S * , F icx ) holds.
Proof. Let g ∈ F icx . First, Proposition A.2 yields that G g (0, ·) is increasing. As the choice of T was arbitrary, we obtain that also G g (t, ·) is increasing by repeating the argument of Proposition A.2 and using homogeneity of S * . Second, Proposition A.3 yields that G g (0, ·) is convex. Denote H g (t, x) = E[g(S * t )|S * 0 = x]. Then, G g (t, x) = H g (T − t, x) since the Markov process is homogeneous. Now Proposition A.3 yields that H g (t, ·) is convex since the choice of T was arbitrary and the claim follows.

Appendix B. Comparison results
In this section we recall the comparison results from Amadori (2007) in our notation. Again, the crucial point for this results is that Lipschitz assumptions on the full domain do not hold. Note that we only consider the one-dimensional, time-homogeneous case here, which simplifies the matter significantly. While minimization is the core topic of Amadori (2007), the financial applications mainly treat maximization, such that we concentrate on the maximization. The stated results follow from the original results by replacing ψ with −ψ.
Fix the state space O = R >0 and consider the controlled diffusion X = X θ dX s = b(X s , θ s )ds + a(X s , θ s )dW s , s > t with initial condition X t = x ∈ O. Our application will be in Proposition 4.3, which considers the non-linear CIR-modell. Since then a 0 =ā 0 = 0 we consider Θ = where the supremum ranges over all adapted processes (θ s ) taking values in Θ, X is the controlled diffusion satisfying (57), and E t,x refers to the conditional expectation conditioning on X t = x. The associated Hamilton-Jacobi-Bellman equation is given in (16). First, Assumption 1 in Amadori (2007) holds. Indeed, note that the functions f and r therein equal to zero in our case, that the functions b and √ a are Lipschitz-continuous on [ , ∞) for all > 0 and all θ ∈ Θ. Moreover, let · C 0,1 ([ ,∞)) denote the Lipschitz 4 coefficient on [ , ∞), then clearly sup{ a(·, θ) C 0,1 ([ ,∞)) : θ ∈ Θ} < ∞, and, similarily, for b and all conditions of Assumption 1 hold. Second, Assumption 4 is implied by the Feller condition b 0 ≥ā 1 /2 > 0. Indeed, Assumption 4 requires that lim sup x→0 sup θ∈Θ 1 x − 2b(x, θ) a(x, θ) < ∞.
2b 1 a 1 , Assumption 4 also holds. In our notation, the uniqueness result following immediately from the comparison principle given in Theorem 4 in Amadori (2007) reads as follows. We refer to the Section 4 for the definitions of viscosity solutions, super-and subsolutions.
(i) Assume Let u be a locally bounded viscosity solution satisfying (58), then u = v.
Let u be a locally bounded viscosity solution satisfying (59), then u = v.
Proof. Recall that a viscosity solution is a supersolution and a subsolution. Since u and v are both viscosity solutions, u = v holds on O × {T }. For (i), note that the conditions hold with γ = 1 both for u and v, such that applying Theorem 4 in Amadori (2007) twice (once as a supersolution and once as a subsolution) yields u = v on O × [0, T ]. The claim (ii) follows similarly.