Interval Analysis and Optimization Applied to Parameter Estimation under Uncertainty

We present a methodology through exemplification to perform parameter estimation subject to possible factors of uncertainty. The underlying optimization problem is posed in the framework of the theory of interval-valued optimization. The implementation of numerical procedures required to achieve efficient solutions implied the use of the l1 norm instead of usual l2 regression. Finally, an implementation using real data was performed, demonstrating the ability of interval analysis to encapsulate uncertainty while facing non-trivial parameter estimation problems.


Introduction
The majority of mathematical models developed to represent various problems depend on sets of parameters whose values are generally determined based on experimental measurements or inferred according to data obtained through observation of the studied phenomenon.
Given a set of experimental data (inaccurate because of the influence of randomness and uncertainty), we would like to estimate the values of those parameters, so that the model attains an acceptable fitting level to the observed data.
In the usual estimation techniques, a vector of real numbers is obtained (a single point estimation for each of the parameters involved), which causes that the probability that the result of the estimation happens to be the actual value of the parameter is fairly low and, therefore, is very sensitive to perturbations [1].
Nevertheless, the use of interval-valued analysis allows us to provide regions in which, given a determined confidence level, we can ensure the enclosing of the solution, thus reducing the impact of random disturbances and computational errors in the final result.
The foundations of interval analysis were established by Moore [1] in his PhD dissertation.Based on that work and further developments provided by Skelboe [2], Hansen [3] and Stroem [4], among others, several authors (Ratschek and Voller [5], Bhurjee and Panda [6]) have studied the potential of interval valued techniques in the field of optimization.On the other hand, authors like Ichida and Fujii [7] or Karmakar and Bhunia [8] have developed significant research related to the implementation of multiobjective techniques including interval valued optimization.However, there is a lack of efforts in the use of these procedures in the context of parameter estimation which is one of the principal aims of this research work.
Thus, the uncertainty in those values due to measurement errors, rounding in calculations, computational representability and even possibly because of a lack of knowledge of qualitative information associated with those parameters, pose interval analysis and, particularly, interval-valued optimization as a very useful tool, leading to the development and establishment of robust techniques when facing the mentioned difficulties, obtaining more reliable and rigorous results in a mathematical perspective.
A methodology to perform parameter estimation under any kind of uncertainty is presented.The underlying optimization problem is posed in the framework of the theory of interval-valued optimization.Numerical results show that estimations obtained allow us to describe successfully the behaviour of the modelled phenomenons by enclosing adequately the uncertainty and sensitivity of the model, overcoming possible difficulties originated in errors generated while measuring or modelling, even in cases with low information availability.
In particular, the first numerical example allowed us to compare the efficacy of the optimization algorithms using an adaptation of the ℓ1 norm in the interval system using the Hausdorff metric, instead of usual ℓ2 regression, related to the Euclidean metric space.Information related to normed spaces can be found in [9] and [10].
Interval Analysis and Optimization Applied to Parameter 109 This paper is organized as follows.In Section 2 we introduce the main concepts of interval arithmetic and analysis, in order to present a concise approach to this theory for the reader.In Section 3, we review some theoretical results related to interval-valued optimization.Finally, Section 4 presents the numerical results obtained for theoretical and real applications of interval optimization techniques.

Interval Arithmetic and Analysis
With the aim of presenting a self contained article, this section describes in general terms the notation, arithmetic and analytic components and some theorems related to interval computation.
Let K c (R) denote the set of all non-empty, compact and convex subsets of R. Let A, B ∈ K c (R) and let ⊙ ∈ {+, −, •, /} be a binary operation on R, e.g., addition, subtraction, multiplication and division.
Let us denote by I the class of all closed and bounded intervals in R. For an interval A we adopt the notation A = a L , a U , where a L and a U mean the infimum and supremum of A, respectively.Throughout this paper, upper case non italicletters represent intervals and lower case non italic-letters represent real number.Whenever the context may not be clear, proper comments shall be made.

Set Operations
Since, intervals are essentially sets of real numbers, it is often useful to define operations related to their behaviour as sets.In particular, intersection plays a key role in interval analysis.If we have two intervals containing a result of interest, regardless of how they were obtained, then the intersection, which may be narrower (see Figure 1), also contains the result.Further set operations between intervals are described in [1] and [11].Figure 2: Geometric properties of an interval.

Interval Properties
Recalling the notation A = a L , a U , we can describe some of the main properties of an interval in term of its endpoints, as can be seen in Figure 2. Definition 2.2.Let A ∈ I.We define the length of A as Definition 2.3.Let A ∈ I.The absolute value of A is defined by

Arithmetic Operations
Let A, B ∈ I. Let ⊙ ∈ {+, −, •, /} be a binary operation on R, e.g., addition, subtraction, multiplication and division.The operation A ⊙ B is defined by: For instance, for the case of the sum of two intervals, the set A + B can be explicitly described in terms of the endpoints of such interval: The Hukuhara difference, ⊖, is a special kind of subtraction between two intervals [12], particularly important for the definition of differentiation of intervalvalued functions.
A complete list of endpoint formulas for the usual arithmetic operations and more details on the topic of interval computation can be found in [1] and [11].
There is a close relation in the mathematical structure of I and R: for example, I is commutative and associative under addition and multiplication, the cancellation law for addition holds and there exist identity elements for such operations.However, we caution that, for A ∈ I, −A is not in general an additive inverse for A, since Similarly, it can be proved that A/A = [1, 1] only of l(A) = 0. Therefore, we do not have additive or multiplicative inverses except for degenerate intervals.However, we always have the inclusions 0 ∈ A − A and 1 ∈ A/A (see [1]).For example, consider the interval Because of this lack of inverse elements under addition, I can not constitute a vector space by itself.However, the work from Radstroem [13] develops the theory of an extension set via equivalence relations in which a commutative semigroup in which the law of cancellation holds, as is indeed true in I, can be embedded in a vector space N where the product λA for λ ≥ 0 coincides with the one given on I.
This allows us to ensure the existence of the family I m , which represents the set of m-dimensional vectors with entries in I, as well as the natural extension to sets of interval-valued matrices.

Interval Analysis
In order to build an analytical structure on I it is fundamental to define the notion of a metric between intervals.Hausdorff [14] proposed a metric between subsets X and Y of a metric space E, given by: where • is the distance defined on the metric space (E, • ).
In particular, the Hausdorff norm induces a metric for the interval system that can be expressed in terms of the midpoints of the selected intervals Given the metric space (I, d H (•)), it is possible to define the usual notions of convergence and limit in this space.Definition 2.5.Let {A n } and A ∈ I.We say that the sequence of intervals {A n } converges to A, denoted by lim n→∞ A n = A, if, for every ǫ > 0, there exists N ∈ N, such that, for n ≥ N , we have d H (A n , A) < ǫ.
Lemma 2.6.Convergence in I can be reduced to the usual convergence in R, that is, Definition 2.7.The function f : R n → I defined on an Euclidean space R n is called an interval-valued function.This function can also be written as where f L and f U are real-valued functions defined on R n and satisfy This definition of interval-valued functions allows us to handle a wide range of situations in which we can describe a function of this nature in terms of two real valued functions, but also we could present a function whose parameters are intervals in I. Definition 2.8.For c ∈ R n , we write lim x→c f (x) = A if, for every ǫ > 0, there exists δ > 0 such that, for x − c < δ, we have d H (f (x), A) < ǫ.In this case, we say A is the limit of f when x tends to c. Theorem 2.9.Let f be an interval-valued function defined on Definition 2.10.Let f be an interval-valued function defined on R n .We say that Further information regarding interval-valued functions, their properties, and proofs of related theorems can be found in [15].

Order Relations
Since, in the context of optimization its is necessary to compare the images of interval-valued functions to be minimized (maximized), a corresponding group of partial order relations can be defined for intervals.These relations are illustrated in Figures 3 -5.Figure 4: Order relation -UC .
In the minimization case, the CW order is particularly interesting because it allows us to obtain the interval with the minimum midpoint and also the smaller width, which in this case can be seen as the variance or uncertainty in the measurement or evaluation of the function.
] is called weakly differentiable at x 0 ∈ X if the real valued functions f L and f U are differentiable at x 0 (in the usual sense).Definition 3.3.Let X be an open set in R. We say f : X → I is H-differentiable (strongly differentiable) at x 0 ∈ X if there exists A(x 0 ) ∈ I such that both exist and are equal at A(x 0 ).Then A(x 0 ) is the H-derivative of f at x 0 .
The following theorem presents the conditions in which a function is (or not) H-differentiable at a point.Theorem 3.4.(Wu [15]) Let X be an open set in R and f : X → I an intervalvalued function defined on X. Suppose that f is weakly differentiable at x 0 ∈ X with derivatives (f L ) ′ (x 0 ) = âL (x 0 ) and (f U ) ′ (x 0 ) = âU (x 0 ).Then, 2. If âU (x 0 ) > âL (x 0 ), then f is H-non-differentiable at x 0 .

KKT Optimality Conditions
In optimization, the Karush-Kuhn-Tucker (KKT) conditions are first order necessary conditions for a solution in non-linear programming to be optimal, provided that some regularity conditions are satisfied [16].The conditions derived in [15] allow us to determine the optimality of a feasible solution of an interval valued optimization problem.
for each λ ∈ (0, 1) and each x ∈ X.It is possible to define UC-convexity in a similar fashion.
Remark 3.6.Let X be a convex subset of R n and f be an interval-valued function defined on X.The function f is LU-convex at x * ∈ X if and only if f L and f U are convex (in the usual sense) at x * ∈ X.
Definition 3.7.Let x * be a feasible solution, i.e., x * ∈ X.We say that x * is a type-I solution [type-II solution] of an interval-valued optimization problem if there exists no x ∈ X such that f (x) Theorem 3.8.(Wu [15]) Consider the interval valued optimization problem (IVOP) given by subject to g i (x) ≤ 0 where the real-valued constraint functions g i : R n → R are convex on R n for i = 1, ..., m.Suppose that the interval-valued objective function f : R n → I is LU-convex and weakly continuously differentiable at x * ∈ R n .If there exist (Lagrange) multipliers 0 < λ L , λ U ∈ R and 0 ≤ µ i ∈ R, i = 1, ..., m, such that 1. λ L ∇f L (x * ) + λ U ∇f U (x * ) + m i=1 µ i ∇g i (x * ) = 0 and 2. µ i g i (x * ) = 0 for all i = 1, ..., m then x * is a type-I and type-II, i.e. optimal under the selected order relation, solution of (IVOP) problem.

Generalized Polynomial Fitting
We say p(x) is an interval-valued polynomial if it can be expressed in the form Consider a set of observations y i = [y L i , y U i ] ∈ I for i ∈ {1, ..., m}.We can model this phenomenon using a n degree polynomial in a matrix form as follows: where V is called the Vandermonde matrix and each ε i in the matrix E is a random error which accounts for the uncertainty in the measurements of the studied phenomenon.A polynomial of degree 9 with random valued interval coefficients (see Table 1) was generated and a random sample, corresponding to the purple intervals, was extracted, see Figure 6.In this case some of the purple sample intervals lie outside the gray area enclosed by the theoretical model due to the influence of the error matrix E, which was synthetically generated using a normal distribution with µ = 0 and σ 2 = 4.With this information, the goal was to estimate the original values of the coefficients that generated this behaviour.
As it is usual in parameter estimation, and even more in polynomial fitting, because of its desirable computational speed and simplicity, our first attempt to achieve that goal was to perform an ordinary least square (OLS) estimation, in which the coefficients of a real-valued polynomial given real observations can be estimated by: This result can be obtain by minimizing the ℓ 2 norm of the residuals between the midpoints of fitted model and the real measurements.
where {y i } is the set of m interval observations and {ŷ i } are the corresponding estimations.
Nevertheless this results provides good results in term of an unbiased estimation of the midpoints of the coefficients of the polynomial, there can exist very high overestimations in the length of the interval (see Figure 7 and Table 2) thus not allowing the identification of the real sensitivity of each parameter in the model.In Figure 7, the red bars represent the real values of the parameters, while the gray bars illustrate the value of the estimated coefficient.A good estimation should, of course, show little discrepancies between the gray and red bars for every coefficient, achieving an enclosure of the parameter, thus being unbiased in midpoint and length.
With the aim of reducing the mentioned difficulties, an heuristic differential evolution algorithm was implemented.Differential Evolution (DE) was developed originally by Price [17] while trying to solve a Chebychev polynomial fitting problem proposed by Storn.A complete description of DE can be found in [17].
The fundamental idea behind DE is the generation of vector through linear combinations of elements in the current population.Then, given an individual in the population, this element is compared with the trial vector generated via linear combination and, if the trial element has a better performance in terms of the objective function to be minimized (maximized), a crossing in the components of the original vector is made under a defined probability level.The estimations obtained using the heuristic search are presented in Figure 8 and Table 3.There is an improvement in the quality of the estimation according to the magnitude of the estimations in relation with the real values of the parameters.In most of the estimations there is an almost unbiased estimation in the midpoint of the intervals.However, some of the coefficients are underestimated or overestimated in the length of the interval, as can be seen in coefficients 7 and 8. Additionally, given the nature of the heuristic, the quality of the estimations is not very uniform and in some cases, the search does not converge to adequate values of the parameters.
As another alternative, we used the software implementation cvx for convex optimization developed in [18] [19].To avoid the overestimation of the interval length, the metric induced in I by the ℓ 1 norm was used to measure the residuals, which can be expressed in terms of the Hausdorff distance in I.In this way the optimization problem can be stated as min Given the definition of this metric, at an optimal point we can ensure an unbiased estimation of both endpoints, which is equivalent to a lack of midpoint-length bias.
The results of this methodology are presented in Figure 9.As can be seen, the estimations agree successfully with the real values of the parameters, with errors of magnitude 10 −7 in relation to the theoretical endpoints.Thus, allowing us to model closely the observed behaviour using an interval-valued polynomial model and enclosing the possible uncertainty in the problem.

Weierstrass Function
In order to evaluate the feasibility of an estimation of the parameters of a model using real data, the employed techniques were tested using data sampled from a Weierstrass function.This function is a typical example of a a pathological realvalued function on the real line, which is continuous everywhere but differentiable nowhere.For a more extensive descrpition of this function refer to [20].The Weierstrass function is given by where 0 < a < 1, b is a positive integer, and ab > 1 + 3 2 π.
In this case, the objective was to extract information about the value of the coefficient a, in this case treated as an interval, based on a set of measurements for x ∈ [0, 1], using the ℓ 1 induced metric in I to define the optimization problem, as shown previously.To favour the clarity of the graphic, the estimated behaviour of the model and a subset of the measurements is presented in the interval [0, 0.125].As can be seen in Figure 10, the estimated coefficient for this model is able to handle the chaotic and noisy behaviour of this function, and also the extreme sensitivity that exists in this parameter, which generates changes of a large magnitude and form of the images of the function when small changes are applied to the value of a, for whom the estimation was remarkably exact.

Fourier Series Applied to the Modelling of Spectral Power Densities
Using hydrophones, measurements of the spectral power density of the sound signals generated by vessels were performed in order to develop a characterization of such crafts.In total 36 measurements were performed, however 12 of those were discarded due to factors that generated changes in behaviour of the spectrum, for example, changes in the speed of the boat and its engines.The 24 accepted measurements are presented in Figure 11, where the horizontal axis represents the frequency in Hz and the vertical axis the spectral power density in dB/Hz.Because of confidentiality issues, the source of this data cannot be specified.In order to describe this behaviour a Fourier series model was proposed.A Fourier series is a way to represent a wave-like function as the sum of simple sine waves, decomposing the signal into the sum of a (possibly infinite) set of simple oscillating functions, namely sines and cosines, as follows: where a 0 models a constant (intercept) term in the data, w is the fundamental frequency of the signal, n is the number of terms (harmonics) in the series.In this case, several order models were estimated, however, an 8 order model followed the trends observed in the measurements in a much more adequate way, especially in the initial and final parts of the data.
Based on the complete set of measurements, the upper and lower bounds in each instant were extracted and, with this (reduced ) information, the coefficients for the model of each of the bounding trajectories were estimated, obtaining the fits shown in Figures 12 and 13 (see Tables 4 and 5). Figure 13: Fitted model upper bound.
Using these estimations an interval-valued function was proposed to enclose the volatility of the measurements using Fourier series to describe the lower and upper functions, i.e. f : R → I, given by f (x) = f L (x), f U (x) , where each of the bounding functions are chosen to be a finite Fourier approximation, as: An interval-valued plot generated from this model is presented in Figure 14.As can be seen in Figure 11, in which the black bold line represent the estimated bounding functions of the model, who is able to describe a significant proportion of the variance present in the measurements performed.
Note that each of the bounds of the enclosing function f is modeled independently using maximum and minimum points of the data at each time.For this reason, obtaining a good fit in both of those functions will ensure that the function f is well defined since, clearly, a good model for the lowest point in every instant, f L , is less than or equal to a good model for the upper bound in the data.This observation can be verified, for the obtained fits in this specific example, in Figure 14.It is remarkable that the effectively used information was related only to the maximum-minimum measurements in every instant.Therefore, the quality of the estimations obtained could have been equally as good as the presented in a situation of scarce available information.It is also possible to perceive a reduction in noise of the signals provided by the model, which is important in order to approximate the local behaviour of the phenomenon.

Definition 2 . 1 .
Let A and B ∈ I.The intersection of two intervals A and B is empty if either b U < a L or a U < b L .Otherwise, A ∩ B := {x : x ∈ A ∧ x ∈ B} = max a L , b L , min a U , b U

Definition 3 . 1 .
Let A = [a L , a U ] and B = [b L , b U ] ∈ I.It is possible to express A as a function of its centre and width, as A = m(A), w(A) .• A LU B if and only if a L ≤ b L and a U ≤ b U • A UC B if and only if a U ≤ b U and m(A) ≤ m(B) • A CW B if and only if m(A) ≤ m(B) and w(A) ≤ w(B)

Figure 11 :
Figure 11: Real measurements -Level as a function of frequency

Figure 14 :
Figure 14: Interval-valued plot of the estimated Fourier series model.

Table 1 :
Real polynomial coefficients

Table 2 :
OLS estimated parameters

Table 3 :
Differential Evolution estimated parameters

Table 4 :
Fourier series -f L estimated parameters