Nonlinear Regression without i.i.d. Assumption

In this paper, we consider a class of nonlinear regression problems without the assumption of being independent and identically distributed. We propose a correspondent mini-max problem for nonlinear regression and outline a numerical algorithm. Such an algorithm can be applied in regression and machine learning problems, and yield better results than traditional regression and machine learning methods.


Introduction
In statistics, linear regression is a linear approach for modelling the relationship between an explaining variable y and one or more explanatory variables denoted x.
The parameters w, b can be estimated via the method of least squares.
The first clear and concise exposition of the method of least squares was published by Legendre [6] in 1805. Later in 1809, Gauss published his method of calculating the orbits of celestial bodies. In that work he claimed to have been in possession of the method of least squares since 1795. Here is the basic theorem of linear regression.
When the i.i.d. (independent and identically distributed) assumption is not satisfied, the usual method of least squares does not work well. This can be illustrated by the following example.   We can see from the graph that most of the sample data deviates from the regression line. The main reason is that (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x 500 , y 500 ) are the same sample and the i.i.d. condition is violated.
In light of this, Lin [7] study the linear regression without i.i.d. condition by using the nonlinear expectation framework laid down by Peng [9]. They split the training set into several groups and in each group the i.i.d. condition can be satisfied. The average loss is used for each group and the maximum of average loss among groups is used as the final loss function. They show that the linear regression problem under the nonlinear expectation framework is reduced to the following mini-max problem. (1) They propose a genetic algorithm to solve this problem. However, the algorithm does not work well generally.
Motivated by Lin [7] and Peng [9]'s work, we consider nonlinear regression problems without the assumption of i.i.d. in this paper. We propose a correspondent mini-max problems and outline a numerical algorithm for solving this problem based on the work of Kiwiel [4]. Meanwhile, problem (1) in Lin's paper can also be well solved by such an algorithm. We also have done some experiments in regression and machine learning problems.

Nonlinear Regression without i.i.d. Assumption
Nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more explanatory variables. (see e.g. [10]) Suppose the sample data (training set) is where x i ∈ X and y i ∈ Y . X is called the input space and Y is called the output (label) space. The goal of nonlinear regression is to find (learn) a The closeness is usually characterized by a loss function ϕ such that Then the learning problem is reduced to an optimization problem of minimizing ϕ.
Following are two kinds of loss functions, namely, the average loss and the maximal loss.
The average loss is popular, particularly in machine learning, since it can be conveniently minimized using online algorithms, which process few instances at each iteration. The idea behinds the average loss is to learn a function that performs equally well for each training point. However, when the i.i.d. assumption is not satisfied, the average loss way may become a problem.
To overcome this difficulty, we use the max-mean as the loss function.
First, we split the training set into several groups and in each group the i.i.d. condition can be satisfied. Then the average loss is used for each group and the maximum of average loss among groups is used as the final loss function. We propose the following mini-max problem for nonlinear regression problem.
Here, n j is the number of samples in group j.
Problem (2) is a generalization of problem (1). Next, we will give a numerical algorithm which solves problem (2).
Remark 2.1. Peng and Jin [2] put forward a max-mean method to give the parameter estimation when the usual i.i.d. condition is not satisfied.
They show that if Z 1 , Z 2 , · · · , Z k are drawn from the maximal distribution M [µ,µ] and are nonlinearly independent, then the optimal unbiased estimation for µ is This fact, combined with the Law of Large Numbers (Theorem 19 in [2]) leads to the max-mean estimation of µ. We borrow this idea and use the max-mean as the loss function for the nonlinear regression problem.

Algorithm
Problem (2) is a mini-max problem. The mini-max problems arise in different kinds of mathematical fields, such as game theory and the worst-case optimization. The general mini-max problem is described as Here, h is continuous on R n × V and differentiable with respect to u. V is a compact subset of R d .
Kiwiel linearized h at each iterative point u k and obtain the convex The next step is to find u k+1 , which minimizesĥ(u).
In general,ĥ is not strictly convex with respect to u, thus may not admit a minimum. To overcome this difficulty, he added a regularization term and the minimization problem is reduced to It can be converted to the following form which is equivalent to By duality theory, the above problem is further transformed into   over all λ = (λ v ) with finitely many λ v = 0 and Our problem (2) is a special case of problem (3) Denote f j (u) = h(u, j) and we give a numerical algorithm for the following mini-max problem.
Suppose each f j is differentiable with respect to u and denote Step stepsize factor σ = 0.5.

Step 2. Direction Finding
Assume that we have chosen u k . For (u k , Φ(u k )), Step 2.1. Initialization Step 2.

Weight Finding
Solve the following quadratic optimization problem. Set If Ψ i ≥ −ξ, stop; otherwise, goto Step 2.3.
Step 3. Line Search and α k = max I k .

The Linear Regression Case
Example 1.1 can be numerically well solved by the above algorithm with The corresponding optimization problem is The numerical result using the algorithm in section 3 is y = 1.7589 * x + 1.2591.
The following picture summarize the result. It can be seen that the method using maximal loss function (black line) performs better than the traditional least square method (pink line).

Machine Learning
In this case, we use the MNIST database 2 of handwritten digits to perform the experiment. Many machine learning models cater to the identification of handwritten digits. We use the multi-layer perception model. Three hidden layers along with an input layer and an output layer are used. The number of neurons in each layer is 784,50,20,12,10, respectively.
The training data is chosen as follows. Firstly, we choose 1000 training data and split them randomly into 10 groups, named G 1 , G 2 , · · · , G 10 . Each group has 100 i.i.d. samples. Then we set Two different method are applied. One is using the average group loss 1 100 100 j=1 1 100 The other is using the maximal group loss max 1≤j≤100 1 100 10000 additional test data are used to test these two models. The accuracy of the method using maximal loss is 65.3%, while the accuracy of the method using average loss is 58.1%.
In this experiment, the whole training set is not i.i.d.. On the other hand, each subgroup is i.i.d.. It turns out that the method using maximal loss performs better than the method using average loss. In the last 20 years, deep learning using many more hidden layers and other machine learning algorithm has achieved accuracy over 90% in the handwriting recognition and other problems. However, we think that the method in this paper can also improve the performance when the training set is not i.i.d..
In this paper, we consider a class of nonlinear regression problems without the assumption of being independent and identically distributed. We propose a correspondent mini-max problem for nonlinear regression and outline a numerical algorithm. Such an algorithm can be applied in regression and machine learning problems, and yield better results than traditional regression and machine learning methods.