Mathematical Programming Glossary - P
Mathematical Programming Glossary - P
Packing problem. The set packing problem is the question, "Does a collection of sets contain at least K mutually disjoint subsets?" (K is a positive integer not greater than the number of sets given.) This is solvable in polynomial time if no set has more than 2 members. Otherwise, it has the NP-hard   equivalent integer program: Max{Sum_j{x_j}: Ax <= 1, x in {0,1}n}, where x_j = 1 if element j is selected; else, x_j = 0. The matrix A has 0's and 1's, where the i-th row corresponds the i-th set to be packed: A(i, j)=1 means element j is in set i; else, A(i, j)=0. The constraint Ax <= 1 means that at most one element of each set can be selected. If the IP maximum is less than K-1, or if it equals K-1 = number of elements, the answer to the set packing problem is "No". Otherwise, let x_1, ..., x_(K-1) be among those elements selected (re-index, if necessary). Let S_i be all sets containing x_i. Due to the packing constraints, no selected element can be in two sets, so these are disjoint. The remaining set, S_K, is composed of all other elements (x_K, ..., x_n), and must be disjoint from the other sets, so the answer to the set packing question is "Yes" (and we can construct the disjoint subsets).

A special case is the node packing problem on a graph: find a set of nodes of maximum cardinality such that no node is incident with the same edge (or arc). In that case, every row of A has exactly two 1's, and this is sometimes called degree-2 inequalities. Another special case is the bin packing problem.

The IP formulation has the natural extension: Max{cx: Ax <= b, x in {0,1}n}, where c >= 0 and b >= 1. This allows up to b_i elements to be selected from the i-th set, and elements can be weighted by value (c).

Parallel algorithm. This is an algorithm that executes more than one instruction at a time by having a computing environment with more than one processor. The types of parallel computing environments are:

    MIMD: Multiple Instruction/Multiple Data (includes massively parallel);
    MISD: Multiple Instruction/Sequential Data (unusual);
    SIMD: Sequential Instruction/Multiple Data (e.g., array processor).

Parallel tangents (PARTAN). An algorithm developed from the zigzag phenomenon observed using Cauchy's steepest descent. It takes two gradient steps, then performs a line search on the line through the first and last points (xk+2–xk). (For a quadratic objective, PARTAN is equivalent to the conjugate gradient method.)

Parameter. A constant in a mathematical program, not subject to choice in the decision problem, but one that could vary outside the control of the decisions. Examples are supplies, demands, loss factors, exponents and coefficients in polynomial functions (of the decision variables). Not all coefficients are parameters, as many are zero by the logic of the model. For example, the only data for a standard transportation problem are the costs, supplies and demands. These can depend upon parameters, but the LP matrix does not – it is the incidence matrix of the network. In general, parameters are data-dependent constants, rather than logically fixed for all instances of the model. Some parameters are simply units of measurement, such as the amount of energy (Btu) in a ton of coal, whereas some parameters are uncertain, like demand for a product.

Parametric analysis. The sensitivity analysis of parameters, which can be vectors in any designated region.

Parametric programming. Solving a family of mathematical programs over a parameter space. For example, consider the family of linear programs:

Min cx: Ax = b + td, x >= 0,

where t is a (scalar) parameter and d is a specified direction vector. Starting with t=0, the LP is solved, then an optimal basis is found (if possible) that remains optimal as t is increased. The max value of t is determined; if this is finite, the basis changes to a new optimal basis (for the max t value) such that t can be further increased, if possible. This is continued until either t cannot be increased any further or a basis is found that remains optimal on the interval, [t_k, inf), where 0 < t_1 < t_2 < ... < t_k are the break points of the optimal objective value as a function of the parameter.

Pareto optimum. See multiple objectives.

Partial conjugate gradient method. The conjugate gradient method is performed for some number of iterations, say k (< n), then it is restarted. (If k=0, this is the special case of Cauchy's steepest descent.)

Partial quasi-Newton method. A quasi-Newton method is performed for some number of iterations, say k (< n), then it is restarted.

Partial solution. Some of the variables are specified. This arises in a branch and bound algorithm, where some variables have been fixed by the branching process. It also arises in bidding algorithms.

Partially ordered set (or poset). A set plus a binary relation on that set that is reflexive, antisymmetric and transitive. This arises in the presence of precedence constraints, and other relations that arise in combinatorial programs.

Particle Swarm Optimization (PSO). This is an evolutionary algorithm based on swarm behavior of animals, like bird flocking. A move from a state is influenced by directions of states in its neighborhood. A consensus function is used to average neighbors' best fitness values, and this is combined with the original state's fitness to obtain a new position for the state. It applies to continuous variables, using a velocity term to derive state increments. The fundamental equation that governs state evolution is

vp = wvp + A r (x*p – xp) + B r' (g* – xp )

where p is a particle, or state, and
vp = velocity vector of particle p
xp = position vector of particle p
x*p = previous position of particle p giving best fitness value
g*p = position of globally best fitness value
w = parameter, called "inertia weight"
r, r' are pseudo-random numbers in [0,1]
A, B = positive parameters, generally in [1,2]

Partitioning problem. This is the combination of packing and covering. Its IP form is Opt{cx: Ax=b, x in {0,1}n}, where Opt could be Min or Max, and A is a 0-1 matrix. The equation A(i,.)x=b_i means that exactly b_i elements must be selected from set i. In particular, a multiple choice constraint is Sum_j{x_j}=1.

Path. As a (differentiable) curve, see path following. As a finite sequence, see graph or network.

Path following. In this context a path is a piecewise differentiable curve in space. The idea is to follow such a path (if one exists) from some initial point to a solution. One example of a path following algorithm is the barrier penalty function, with the path created by the parameter u in the solution, x*(u) in argmax{f(x) + uP(x): x in X^0}, where X^0 is the strict interior of the feasible region. (This is called the central path, or the path of the analytic center.)

Pattern search. This is a heuristic (in the sense of not guaranteeing convergence) for unconstrained optimization. It consists of two basic steps, where x is a current point and w is a vector of widths (both having been initialized).

  1. Exploration move. Set y=x and do for j=1,...,n:
    If f(y + w_j e_j) > f(y), replace y with y + w_j e_j.
    If f(y - w_j e_j) > f(y), replace y with y - w_j e_j.

    If x=y at the end of this, go perform a pattern move. Otherwise, set v = y-x (the change) and x = y (move to better point).

  2. Pattern move. If f(x + v) > f(x), replace x with x+v. Otherwise, reduce widths and return to exploration move.

The idea is to perform exploration moves as long as they are successful. Then, saving the last direction (v) when it was successful, this is used to advance x if it is an improvement, giving a new base point (x) for the next series of exploration moves. Termination occurs when the widths become too small.

Penalty function. The traditional concept is to augment the objective with a function and one positive constant, so that the original mathematical program is replaced by solving a parametric family of the form Max{f(x) - uP(x): x in X^0}. The function, P, is called a penalty function if it satisfies P(x) > 0 for x not feasible and P(x)=0 if x is feasible. The set X^0 depends on the type of penalty function, and there are two classical types, each requiring P to be continuous: interior (or barrier) and exterior. A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.

More generally, a penalty function could involve many parameters, such as the Lagrangian, or it could be stated in parameter free form. A general model is the generalized penalty-function/surrogate dual, and the notion of exact penalty function is that of a strong dual.

Perturbation. An imprecise term that means changing some parameter value (or function) and seeing what effect this has on the solution. This embodies continuity and differentiability properties of the optimal responses: objective value, policy set, and perhaps dual values too. This is what is sometimes called marginal analysis of solutions, and changes are limited to some neighborhood of the initial values. Some use perturbation function to mean the optimal objective value, even for large changes in the parameters, as in parametric programming.

Phase I & Phase II. Phase I of a mathematical program is finding a feasible solution, and Phase II is entered with a feasible solution to find an optimal solution. The standard Phase I is:

Max Sum_i{u_i} + Sum_i{v_i + w_i}: u, v, w >= 0,

g(x) - u <= 0, h(x) - v + w = 0.

Then, for any x, one could set u = g(x)^+, v = h(x)^+ and w = -h(x)^- to have a feasible solution to the Phase I mathematical program. The optimal objective value is 0 if, and only if, u=0 and v=w=0, in which case x is feasible in the original mathematical program (to commence Phase II). The Phase I problem is sometimes called an elastic program (thinking of the artificial variables (u,v,w) as providing "elastic" behavior in the levels of constraint violation).

In linear programming, the standard Phase I & II are:

I. Min ev: x, v >=0, Ax + v = b, and
II. Max cx: x >= 0, Ax = b, where b >= 0 (multiply by -1 for i: b_i < 0).

Pivot. This is the algebra associated with an iteration of Gauss-Jordan elimination, using the forward transformation. The tableaux for a pivot on element a(p,q) (not= 0), which means nonbasic variable x_q enters the basis in exchange for basic variable x_p, are as follows:

Before pivot:
      Basic        |  Nonbasic    
      Var.   Level | x_j     x_q    
      =============================
                   |
      x_i    b_i   | a(i,j)  a(i,q)
                   | 
      x_p    b_p   | a(p,j)  a(p,q)*
                   |
      =============================
      obj    -z    | d_j     d_q
      =============================  
After pivot:
      Basic                         |              Nonbasic      
      Var.  Level                   | x_j                           x_p      
      ============================================================================
                                    | 
      x_i   b_i - b_p a(i,q)/a(p,q) | a(i,j) - a(i,q)a(p,j)/a(p,q)  -a(i,q)/a(p,q)
                                    | 
      x_q   b_p/a(p,q)              | a(p,j)/a(p,q)                       1/a(p,q)
                                    | 
      ============================================================================
      obj   -z - b_p d_q/a(p,q)     | d_j - a(p,j)d_q/a(p,q)           -d_q/a(p,q)
      ============================================================================ 
A pivot is primal degenerate if the associated basic solution (x) does not change (i.e., the nonbasic variable enters the basis, but its level remains at the same bound value, in which case no basic variable changes level). Similarly, the pivot is dual degenerate if the associated dual solution (i.e., pricing vector and reduced costs) does not change. For dealing with degenerate pivots, see Bland's rule and the TNP rule.

Pivot selection. In the simplex method, this is a tactic to select a basis exchange. The incoming column is based on its effect on the objective function, and the outgoing column is based on its effect on feasibility.

P-matrix. A square matrix with all of its principal minors positive. This includes all symmetric, positive definite matrices. Here is an example of a P-matrix that is not positive definite:

The principal minors are positive, but (1,1)A(1,1)t < 0. The significance of this class is in the theorem:
The linear complementarity problem, defined by (M, q), has a unique solution for each q in Rn if, and only if, M is a P-matrix.

Pointed cone. A convex cone, C, such that C(–C)={0}. An example is C=Rn+ (so –C=Rn–).

Point-to-set map. Image of map is a set. If F:X    >2Y, this means F(x) is a subset of Y for each x in X. An example is the feasibility map.

Polar cone, of a set S in Rn. {(y, y0) in Rn+1:   yx <= y0 for all x in S}.

Policy iteration. This is an algorithm for infinite horizon dynamic programs (generally stochastic) that proceeds by improving policies to satisfy the fundamental equation:

F(s) = r*(s) + Sum_s'{P(s,s')F(s')},

where F(s) is the maximum expected value when starting in state s, r*(s) is the immediate expected return when in state s and following an optimal policy (a decision rule), and P(s,s') is probability of a transition from state s to state s' in one time period.

The algorithm has some policy, x*(s), at the beginning of an iteration. This determines an approximation of F, which is exact if the equation holds. If the equation is violated, the violation identifies how the policy can be improved. This changes the policy for the next iteration. Convergence depends upon the underlying Markov process (e.g., whether it is ergodic). Another approach is value iteration.

Polyhedral annexation. This is a cutting plane approach to find an optimal solution known to lie at an extreme point of a polyhedron, P. The general algorithm is to start at some extreme point and solve the polyhedral annexation problem. This will result in ascertaining that the extreme point is (globally) optimal, or it will generate a recession direction from which a convexity cut is used to exclude the extreme point. The approach generates a sequence of shrinking polyhedra by eliminating the cut region. Its name comes from the idea of annexing a new polyhedron to exclude from the original, homing in on the extreme point solution.

Notes:

When applied to reverse convex programming, the convex set (S) in the polyhedral annexation problem is the objective level set, {x: f(x) <= f(x*)-t}, where t > 0 is an optimality tolerance, and x* is the best (extreme point) solution so far. The initial polyhedron is the feasible region, so this is an inner approximation strategy.

When applied to mixed integer programming, the original polyhedron is the LP relaxation, and the convex set (S) for the polyhedral annexation problem can vary according to the particular problem (exploiting structure, especially to find facets of the feasible region).

Polyhedral annexation problem. Given a convex cone C, a polytope P contained in C, and a compact, convex set S with 0 in int(S), find a point y in P\S, or ascertain that P is contained in S.

Polyhedron (pl. polyhedra). A set that equals the intersection of a finite number of halfspaces. This is generally written as {x: Ax <= b}, where the representation (A,b) is not unique. It is often useful to separate the implied equalities: {x: Ax <= b, Ex = c}, so that the relative interior is {x: Ax < b, Ex = c}. The system, {Ax <= b, Ex = c}, is a prime representation if it is irredundant, and it is minimal if it is irredundant and contains no implied equality. A polyhedron is degenerate if it contains an extreme point that is the intersection of more than n halfspaces (where n is the dimension of the polyhedron). An example is the pyramid.

Polymatroid. Let N be a finite set and let f be a submodular function on the subsets of N with f(Ø)=0. The polymatroid associated with (N,f) is the polytope:

{x in Rn+: Sum{j in S} xj <= f(S)   for all S in N}.

Polytope. A bounded polyhedron.

Pooling of inventory. When there are two or more demand points for a product, it may possible to save money if the separate inventories for these demand points can be combined or “pooled.” There is a “square root economy of scale” from pooling for two of the simplest inventory models: a) the Economic Order Quantity (EOQ) model, and b) the Newsboy (NV) model.

For the EOQ case, where we want to minimize the fixed charge of ordering plus the holding cost of inventory carried, if we have two separate inventories for two demand points, each with constant demand rate D, fixed charge of ordering K, and holding cost/unit of inventory h, then the minimum total cost is sqrt(2*D*K*h) at each demand point for a total cost of 2*sqrt(2*D*K*h). If we combine the two demand streams and carry a single inventory for them, then the minimum total cost is sqrt(2*(2*D)*K*h) = 2*sqrt(D*K*h), which decreases the total cost by a factor of 1/sqrt(2).

For the NV case, if we have two demands, each independently Normal distributed with mean D and standard deviation σ, a specified service level α = Prob{Demand < inventory} at an inventory point, and zα being the number of standard deviations corresponding to a left tail probability of α, then we must carry inventory D + zα*σ at each inventory point for a total inventory of 2*D + 2*zα*σ. If we can combine the two demands so we carry just one inventory, then the mean and standard deviation for the combined demands are 2*D and sqrt(2)*σ. Thus, the total inventory we must carry is 2*D + sqrt(2)*zα*σ, and there is a square root economy of pooling in the amount of safety stock we must carry. If we say that the system service level should be α, then the benefits of pooling are even greater.

There is an interesting anomaly wherein pooling can increase inventory. Specifically, for the NV problem, if instead of there being a constraint on service level, the objective is simply to minimize the expected cost of unsatisfied demand and the cost of surplus inventory, and the demand distributions at each of two demand points (with associated inventory) are right skewed, e.g., the mean is greater than the median, then pooling the two inventories may in fact result in a higher total optimal inventory. For example, if the cost/unit short is 5, the cost/unit of inventory left over is 4, the demands at each of two demand points are independent Poisson distributed with mean 3.3, then the optimal amount to stock at each location is 3. If, however, the demands are pooled and a single inventory is carried so the demand is Poisson with mean 6.6, then the optimal amount to stock is 7. So pooling increased the optimal inventory. See Myths and Counter Examples for an additional example.

Pooling problem. A type of blending problem, as follows.

    Given:
  1. A set of sources, indexed by 1,...,N_s. Each source supplies raw feeds in limited amounts, such as crude oil.
  2. A set of attributes, indexed by 1,...,N_a. Each supply contains one or more attributes. An example of an attribute is the percent of sulfur in a crude oil supply.
  3. A set of products, indexed by 1,...,N_d, for which there is demand; each product has a quality defined by its attributes. For example, if percent sulfur is an attribute, each refined petroleum product has a range of percent sulfur: 1.5% sulfur is a higher quality product than 2.5% percent sulfur.
  4. A set of pools, indexed by 1,...,N_p. Each pool is formed by inflows from sources, whose attributes are linearly mixed to determine the attributes of the pool, and hence its quality. Pools combine to form products (see Flow of Materials below). For example, suppose a pool receives flow values x and y from two sources whose attribute values are A and B, respectively. Then, the pool's attribute value is the average: (Ax+By)/(x+y).
                            Flow of Materials
                            ~~~~~~~~~~~~~~~~~
                Sources          Pools         Products
                         S(s,p)         P(p,d)
      SUPPLY --->(s)------------->(p)------------>(d)--->DEMAND
    
The objective is to minimize cost, which is the sum of source flow costs, subject to four types of constraints:
   1. Limited supplies:  Sum_l{S(s,l)} <= SUPPLY(s).
                         -------------
                          Total flow out
                          of source s
                         (into pools)

   2. Balances at pools:  Sum_s{S(s,p)} - Sum_d{P(p,d)} = 0.
                          -------------   ------------
                           Total flow      Total flow
                           into pool       out of pool
                          (from sources)  (to products)

   3. Quality constraints:

                           Sum_s{w(s,a)S(s,p)}
      L(p,a) <=  Q(p,a) = --------------------- <= U(p,a).
                               Sum_s{S(s,p)} 

      The numerator is a weighted sum, where the weights (w) are
      the (given) quality values of attribute a at the sources (s). 
      The denominator is the total flow into the pool (p).

   4. Demand requirements:  Sum_p{P(p,d)} >= DEMAND(p).
                            -------------
                             Total flow
                             into product
                            (from pools)                      

Portfolio selection problem. In its elementary form, this is the same as the capital budgeting problem, except that the objective is to minimize the risk, rather than maximize expected return. Let x_j be the percent of capital invested in the j-th opportunity (e.g., stock or bond), so x must satisfy x >= 0 and Sum_j{x_j}=1. Let v_j be the expected return per unit of investment in the j-th opportunity, so that vx is the sum total rate of return per unit of capital invested. It is required to have a lower limit on this rate: vx >= b (where b is between Min{v_j} and Max{v_j}). Subject to this rate of return constraint, the objective is the quadratic form, x'Qx, where Q is the variance-covariance matrix associated with the investments (i.e., if the actual return rate is V_j, then Q(i,j) = E[(V_i - v_i)(V_j - v_j)].

Positive definite matrix (A). x'Ax > 0 for all nonzero x.

Positive semi-definite matrix (A). x'Ax >= 0 for all x.

Postman problem. See Chinese postman problem.

Posynomial. A positive sum of monomials: Sum_i{c(i)Product_j[x(j)^a(i,j)]}, where c > 0. Each monomial is the product,

Product_j[x(j)^a(i,j)] = x(1)^a(i,1) * x(2)^a(i,2) *...* x(n)^a(i,n),

and [a(i, j)] is called the exponent matrix. This is the fundamental function in geometric programming.

Example: x(1)x(2) + 2x(3)/x(1)^2 is a posynomial with 2 monomial terms and 3 variables. The exponent matrix is 2 by 3, showing the exponent in each monomial (row) of each variable (column):
               _      _ 
              |  1 1 0 |
              |        |
              | -2 0 1 |
              |_      _|   

Precedence constraint. When ordering objects, like jobs to to be performed, this is a constraint that restricts the order: i must precede j, denoted i << j. If order really means time, and if the model has decision variables t_i and t_j to denote the start times of i and j, resp., this precedence constraint can be written as t_j >= t_i + T_i, where T_i is the time job i takes. On the other hand, a precedence constraint need not correspond to real time. For example, i << j could mean if project j is not selected, we cannot select project i. In that case, suppose the model has binary variables x_i and x_j, where x_i=1 means project i is selected, and x_i=0 means it is not selected. Then, the precedence constraint i << j is represented as: x_i <= x_j.

Preconditioning. A generalization of scaling that modifies the hessian so as to improve convergence properties. In the case of quadratic programming, this typically means multiplying the quadratic form matrix, Q, by a preconditioner, P, so that the condition number of PQ is as close to 1 as possible. There are many variations, including splitting a matrix, Q = P - S, to achieve a better conditioned matrix in the sense of its eigenvalue (and eigenspace) structure that governs convergence properties of algorithms like steepest ascent and conjugate gradient.

Predictor-corrector algorithm. This is a class of algorithms whose origin is in ordinary differential equations. One first "predicts" a solution with either a functional approximation or an algorithm, getting "close" to a solution. Then, this is the initial point for a "corrector" method, such as using Newton's method to get "closer" to the curve. Although it could be used to characterize many methods in mathematical programming, it arises most naturally in path following methods. The term began being widely used in connection with following the central path in an interior point method. One such algorithm is using the affine scaling direction as a predictor and centering as a corrector.

Pre-processing. Generally the same as presolve, but the purpose need not be to speed up optimization. Instead, it could be aimed at deepening one's knowledge about the mathematical program, such as what is forced by implication of the constraints versus what is determined by the economic trade-off.

Presolve. A heuristic applied to reduce the problem in some way before starting an algorithm. In linear programming, for example, one might scan for an equation of the form x=0, then simply fix x at zero, thus reducing the number of variables and constraints. Further details are in the supplement, Presolving A Linear Program.

Pricing. This is a tactic in the simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let p = c_B[B^-1], where B is a basis, and c_B is a vector of costs associated with the basic variables. The vector p is sometimes called a dual solution, though it is not feasible in the dual before termination. p is also called a simplex multiplier or pricing vector. The price of the j-th variable is c_j - pA_j. The first term is its direct cost (c_j) and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column (A_j). The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.

Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing r = [B^-1]A_j. If p = v[B^-1], then pA_j = vr, and v = c_B is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let v_i=1 if x_i=0 and v_i=0 if x_i > 0. Then, vr = Sum_i{r_i: x_i=0}. Thus, if pA_j > 0, activity j must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.

Primal Degeneracy. See Degeneracy

Primal method. An algorithm for which each iterate is feasible in the (primal) mathematical program.

Primal program. The original mathematical program, cited in the context of having its dual.

Prime representation. A set represented by an irredundant system of inequalities. (See polyhedron.)

Principle of optimality. The idea that an optimal path, or trajectory, is composed of optimal sub-paths. Equivalently, once we reach a particular state in a sequential decision process, the remaining decisions must be optimal with respect to that state. This is the foundation of dynamic programming.

Problems. There are various problems that have evolved that are popularly cited by name.

Product form of basis. See factored form of basis.

Product mix problem. To determine what mix of products to make from shared resources. For example, if a company has apples and cranberries, one could make a certain amount of apple juice, cranberry juice, cranapple juice, and applecran juice, each differing in the percentages of apples and cranberries. More extensive examples abound to illustrate one of the early, fundamental applications of linear programming.

Production scheduling problem. To determine levels of production over time. Constraints include demand requirements (possibly with backordering), capacity limits (including warehouse space for inventory), and resource limits. One basic model is as follows.

Let
x(t) = level of production in period t (before demand);
y(t) = level of inventory at the end of period t;
U(t) = production capacity in period t;
W(t) = warehouse capacity in period t;
h(t) = holding cost (per unit of inventory);
p(t) = production cost (per unit of production);
D(t) = demand at the end of period t.

Then, the mathematical program is

Min px + hy: 0 <= (x, y) <= (U, W), y(t+1) = y(t) + x(t) - D(t) for t=0,...,T.

y(0) is the given initial inventory, and T is the planning horizon.

The condition that y >= 0 means there is no backordering. Other tacit assumptions, which could be relaxed to gain more scope of the model are as follows.

Also see warehouse problem.

Projected gradient method. A feasible direction is obtained for the region {x: Ax=b} by projecting the gradient of the objective function to the null space of A: d = P grad_f(x), where P = [I - A'[AA']^-1 A] (note A[Py] = 0 for all y, so {Py} is the null space of A). Thus, x + ad is feasible for all feasible x and all a > 0 (since Ad=0). Further, if Pgrad_f(x) not= 0, f(x+ad) = f(x) + a grad_f(x)'Pgrad_f(x) > f(x) for a > 0, so the objective value improves each iteration until its projected gradient is null. At that point where Pgrad_f(x)=0, x satisfies first-order conditions.

Proper optimum. A local optimum that is uniquely optimal in some feasible neighborhood.

Pseudo-boolean function. This maps {0,1}^n to R.

Pseudo-boolean program. X={0,1}^n and all functions are pseudo-boolean.

Pseudoconcave function. Negative is pseudoconvex.

Pseudoconvex function. f is in C^1 and satisfies the property: grad_f(x)(y-x) >= 0 implies f(y) >= f(x). While every differentiable convex function is pseudoconvex, arctan(x) is an example of a pseudoconvex function that is not convex.

Pseudocost. A numerical value put on a 0-1 variable that estimates the change in objective value for setting the variable to 0 or 1, called up-pseudocost and down-pseudocost, respectively. These are used to guide branch and bound in node selection.

Pseudo-inverse (of a matrix). Same as generalized inverse.

Pseudo-monotone function.   f: Rn –> Rn is pseudo-monotone over X (subset of Rn ) if

f(y)T (x – y) >= 0 implies f(x)T (x – y) >= 0 for all x, y in X.

The gradient of a pseudoconvex function is a pseudo-monotone function.

 


Notation

Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2008, Mathematical Programming Glossary, by the INFORMS Computing Society