Investigation on Extended Conjugate Gradient Non- linear methods for solving Unconstrained Optimization

In this paper we have generalized the extended Dai-Yuan conjugate Gradient method by Considering the parameter in the denominator of as a convex combination. Three values of are computed in three different ways namely by assuming descent property, Pure Conjugacy and using Newton direction. The descent property and global convergence for the proposed algorithms are established. Our numerical experiments on some standard test functions show that there are considerable improvement on other classical methods in this field. 1Introduction: Consider the unconstrained optimization problem defined by: (1) Where is continuously differentiable. The line search algorithm for (1) often generates a sequence of iterates by letting Investigation on Extended Conjugate Gradient Non-linear methods for ... 75 (2) where is the current iterate point, is a descent search direction i.e. and is a step length. Different choices of and will determine different line search methods [8-10]. These methods are divided into two stages at each iteration: a) Choose a descent search direction . b) Choose a step-size along the search direction . Throughout this paper, we denote by by and by respectively. denotes the Euclidian norm of vectors. One simple line search method is the steepest descent method if we take as a search direction at every iteration, which has wide applications in solving large-scale minimization problems [11]. However, the steepest descent method often yields zig-zag phenomena in solving practical problems. Which makes the algorithm converge to an optimal solution very slowly or even fail to converge [6]. then the steepest descent (SD) not recommended for practical use. If is the search direction at each iteration in the algorithm, Where is an matrix approximation to the , then the corresponding line search method is called Newton like method such as quasi-Newton or variable metric etc, on the other hand if the method is called Newton method, Which is one of the more the successful algorithm for unconstrained optimization if is symmetric and positive definite and satisfies quasi-Newton condition given by (3) Where and For the general non-liner objective function the convergence of the Newton Algorithm to a solution cannot be guaranteed from an arbitrary initial point . In general if initial point is not sufficiently close to the solution then the algorithm may not posses the descent property, the other drawback of the Newton or quasiNewton method is required to store and compute matrix at each iteration and these adds cost of storage and computation. Accordingly these methods is not suitable to solve large scale optimization problems in many cases [7]. The conjugate gradient method is very useful for solving (1) especially when is large and has the following form: (4) Where is a parameter, in the case when is a convex quadratic function and Khalil K. Abbo & Osama M. T. Waiss 76 The conjugate gradient method is such that the conjugacy condition holds [2], namely (5) For general non-liner function Dai and Liao in [2] replaced the conjugacy condition (5) to the following form (6) Which is called pure conjugacy conditions additionally if inexact line search is used also see [2], the condition in (6) can be written as: (7) When is scalar. Several kinds of formulas for has been proposed. For example Fletcher–Reeves (FR). Polak–Ribiere (PR). and Hestenes–Stiegel (HS). formulas are well Known and they are given by:


Abstract
In this paper we have generalized the extended Dai-Yuan conjugate Gradient method by Considering the parameter in the denominator of as a convex combination. Three values of are computed in three different ways namely by assuming descent property, Pure Conjugacy and using Newton direction.
The descent property and global convergence for the proposed algorithms are established. Our numerical experiments on some standard test functions show that there are considerable improvement on other classical methods in this field.

1-Introduction:
Consider the unconstrained optimization problem defined by: (1) Where is continuously differentiable. The line search algorithm for (1) often generates a sequence of iterates by letting (2) where is the current iterate point, is a descent search direction i.e. and is a step length. Different choices of and will determine different line search methods [8][9][10]. These methods are divided into two stages at each iteration: a) Choose a descent search direction . b) Choose a step-size along the search direction . Throughout this paper, we denote by by and by respectively. denotes the Euclidian norm of vectors. One simple line search method is the steepest descent method if we take as a search direction at every iteration, which has wide applications in solving large-scale minimization problems [11]. However, the steepest descent method often yields zig-zag phenomena in solving practical problems. Which makes the algorithm converge to an optimal solution very slowly or even fail to converge [6]. then the steepest descent (SD) not recommended for practical use.
If is the search direction at each iteration in the algorithm, Where is an matrix approximation to the , then the corresponding line search method is called Newton like method such as quasi-Newton or variable metric etc, on the other hand if the method is called Newton method, Which is one of the more the successful algorithm for unconstrained optimization if is symmetric and positive definite and satisfies quasi-Newton condition given by (3) Where and For the general non-liner objective function the convergence of the Newton Algorithm to a solution cannot be guaranteed from an arbitrary initial point . In general if initial point is not sufficiently close to the solution then the algorithm may not posses the descent property, the other drawback of the Newton or quasi-Newton method is required to store and compute matrix at each iteration and these adds cost of storage and computation. Accordingly these methods is not suitable to solve large scale optimization problems in many cases [7].
The conjugate gradient method is very useful for solving (1) especially when is large and has the following form: (4) Where is a parameter, in the case when is a convex quadratic function and The conjugate gradient method is such that the conjugacy condition holds [2], namely (5) For general non-liner function Dai and Liao in [2] replaced the conjugacy condition (5) to the following form (6) Which is called pure conjugacy conditions additionally if inexact line search is used also see [2], the condition in (6) can be written as: (7) When is scalar. Several kinds of formulas for has been proposed. For example Fletcher-Reeves (FR). Polak-Ribiere (PR). and Hestenes-Stiegel (HS). formulas are well Known and they are given by: The global convergence properties of the FR, PR and HS methods without regular restarts have been studied by many researchers [1][2][3].The conjugate gradient methods with regular restart was also found in [4].
To establish convergence properties of these methods it is usually required that the step size should satisfy the strong Wolfe conditions (SWC): . On the other hand, many other numerical methods (e.g. the steepest descent methods and quasi-Newton method) for unconstrained optimization are proved to be convergence under the standard Wolfe conditions (SDWC). which are weaker than the (SWC): Line search strategies require the descent condition (13) However most of conjugate gradient methods don't always generate a descent search direction [5], so condition (13) is usually assumed in the analysis and implementation. Some strategies have been studied which produce a descent search direction within the framework of conjugate gradient methods for example: Hiroshi and Naoki in [5] generalized the Dai and Yuan (DY) [3], which is defined as follows: where (14) Their generalization of (14) as follows: they are assumed that (15) Where (16) The equation (15) is equivalent to (17) And they are suggested three different values for : 1) therefore Where and and any vector with 3) The algorithms defined in equation (4) with is defined in (18) or (19) or (20) is called Extension of the Dai-Yuan (DY) method and the search direction generated by the above algorithms generates descent direction whenever the condition (17) satisfied for more detail see [5]. This paper is organized as follows: In section 2 we deal with an extension of the DY method and we give another three different values for this values are based to the descent property, pure conjugacy condition and Newton direction. In section 3 the convergence analysis studied and in section 4 the numerical experiments are reported.

2-New proposed algorithms
In this section, we try to find new values for that satisfies the condition given in equation (17), using three different methods:

Descent property
Hiroshi and Naoki in [5] show that if the condition (17) is satisfied then the related conjugate gradient (CG) method will be generates always descent directions for all . Now consider Then the search direction can be defined as: (21) If With simple algebra (22) Therefore our first new algorithm say (MH1-CG) can be define as MH1 (23) Where and is defined in equation (22) with the condition if set and if set . In equation (23), we multiply by for the purpose of the global convergence.

Pure Conjugacy property
The second method to evaluate the value of is the pure conjugacy condition defined in equation (6)

Assuming parallel to the Newton direction
As we know when initial is close enough to a local minimum point then the best direction to be followed in the current point is the Newton direction -. Therefore our motivation is to choose the parameter in (4) so that for every the direction given by Can be best direction. Hence using the direction from the equality (26) When G -1 is inverse Hessian which is symmetric and positive definite. Multiply equation (26) by noting that and using equation (3) with simple computations we obtain (27) Then the third algorithm (MH3-CG) say is given by (28) Where (29) Where computed from (27) with the condition if or set In equation (28), we multiply by for the purpose of the global convergence.

3-Convergence analysis
In this section we have proved the global convergence property of the algorithm MH1. Our proof are based to the theorem given in the paper proposed by Gilbert and Nocedal (Gilbert and Nocedal, 1992), They show that any non-liner conjugate gradient algorithm that satisfies the assumpssion (3.1) below will be globally convergent according to the theorem (1) and theorem (2) (given later on).

Assumption (3.1):
(a) 1-The level set is bounded below, where x 1 is initial estimate for the minimizer. 2-In some neighborhood of the objective function is continuously differentiable and its gradient is Lipchitz continuous 3-The step size satisfies the Wolfe conditions (b) The parameter satisfies the following inequality Theorem (1): suppose that assumption (3.1) hold. consider any method of the form (2) and (4). with in SWC, then the method generator descent directions satisfying Proof (see Gilbert and Nocedal, 1992)

Theorem (2):
Suppose that assumption (3.1) hold. consider any method of the form (2) and (4) with , then Proof (see Gilbert and Nocedal). Theorem (1) and Theorem (2) shows that for conjugate gradient methods, for which and satisfies strong Wolfe conditions then the methods generates descent direction and they are globally convergent. Therefore to prove descent property and global convergent to the MH1 or (MH2,MH3)-conjugate gradient methods we need only to show (30) To prove the inequality (30). Since , are positive scalars ( ) and by second Wolfe condition then (31) To establish the second part of the inequality (30) from (31) we have (32) Multiply equation (32) by to set

4-Numerical Experiments
This section presents the performance of FORTRAN implementation of our new conjugate gradient algorithms (MH1,MH2 and MH3) on a set of unconstrained optimization test problems taken from (Andrei, 2008). We select (15) large scale test problems in extended or generalized form (see Appendix), for each function we have considered numerical experiments with number of variables and .
We have compared the performance of these algorithms versus given in equation (18)