Mathematical Programming Glossary - C
Mathematical Programming Glossary - C

Calculus of variations. See variational calculus.

Capacity expansion. This is an augmentation to any mathematical program that has constraints representing a capacity limit. Suppose g(x) <= b represents the limitation of using more than b units of capacity, where g(x) is the actual amount used for policy x. Then, this becomes a capacity expansion model by replacing the constraint with g(x) - vC <= b and 0 <= v <= 1. Here C is the maximum amount of new capacity, with vC the portion brought online by choice of v. This is also reflected in the objective function with an added cost term, say F(v), such that F(0)=0 and F increasing on [0, 1]. If continuous use of the capacity is not possible (i.e., one builds a new plant or not), the model further requires v to be binary-valued (0 or 1), and F(1) is called the fixed charge.

Capital budgeting problem. In its elementary form, there is a fixed amount of capital, say C, that can be allocated to any of n investments. Each investment has a minimum level, say L, and a maximum level, say U. The expected return on investment is a function, v_j(x_j), where x_j is the level of the j-th investment opportunity (L_j <= x_j <= U_j). Risk is measured by a standard deviation from the expected return, say s_j(x_j). The problem is to maximize total expected return, subject to a budget constraint: Sum_j{x_j} <= C, and a risk constraint: Sum_j{v_j(x_j) + a_j s_j(x_j)} <= b, where a_j and b are parameters. The returns on the investments could be correlated. Then, if Q is the variance-covariance matrix, the risk constraint is quadratic: vx + x'Qx <= b. (Also see the portfolio selection problem.)

Carathéodory conditions. For the classical Lagrange form, Opt{f(x): x in R^n, h(x)=0}, where f, h are in C^1, the following conditions are necessary for a feasible x to be optimal: there exists (y0, y) in R^(m+1)\{0}, called multipliers, such that

y0 grad_f(x) – y grad_h(x) = 0.

This reduces to the Lagrange Multiplier Rule when y0 is not zero (divide by y0), which must be the case if grad_h(x) has full row rank.

Caterer problem. A caterer has booked his services for the next T days. He requires r_t fresh napkins on the t-th day, t=1,...,T. He sends his soiled napkins to the laundry, which has 3 speeds of service: s=1, 2, or 3 days. The faster the service, the higher the cost, c_s, of laundering a napkin. He can also purchase new napkins at a cost, c_0. With an initial stock of N napkins, the caterer wishes to minimize his total cost. (This can be formulated as a transportation problem.)

Cauchy-Schwarz inequality. |xy| <= ||x|| ||y||, where ||*|| is the Euclidean norm. (Note: for ||x||=||y||=1, xy is the cosine of the angle between vectors x and y.) .

Ceiling. This is the integer round-up of a value, x:

Ceiling(x) = min{z: z integer, z >= x}.

Examples: Ceiling(5)=5, Ceiling(5.001)=6, Ceiling(–1.2)=–1. Also, see floor.

Central path. A differentiable curve where each point is the analytic center of a polyhedron associated with a linear program. Specifically, suppose we have the primal-dual pair:

P: Max cx: x >= 0, Ax <= b.
D: Min yb: y >= 0, yA >= c.

Let s = b - Ax (primal slack). Then, the primal central path is:

x*(u) = argmax{cx + u[Sum_j{ln(x_j)} + Sum_i{ln(s_i)}]: x, s > 0} for u > 0,

where u is the penalty parameter. Let d = -(c - yA) (dual surplus). Then, the dual central path   is:

y*(u) = argmin{yb - u[Sum_j{ln(d_j)} + Sum_i{ln(y_i)}]: d, y > 0} for u > 0.

A key fact is that if the LP optimality region   is nonempty and bounded, the central path converges to its analytic center as u–>0.

Certainty equivalent. See stochastic program.

Certificate. This is a set of sufficient conditions for some property to hold.

Certificate of optimality consists of conditions that certify some feasible solution, x, is optimal. One class of comes from duality: let y be feasible in any dual with objective value F(y). Then, F(y)=f(x) is a certificate of optimality. In particular, if we have an LP:

max cx: Ax = b, x >= 0,
a certificate that a feasible x is optimal is the set of dual conditions:
yA >= c   and   cx=yb.
(Note the general use of duality does not require any convexity assumption, and the dual can be weak.)

Certificate of unboundedness consists of conditions that certify a mathematical program is unbounded. One class comes from duality: a primal feasible solution is found and the dual is determined to be infeasible. In particular, if we have an LP:

max cx: Ax = b, x >= 0,
a certificate that this is unbounded is that we found feasible x and determined yA >= c implies contradiction.

Certificate of infeasibility consists of conditions that certify a mathematical program is infeasible. One class comes from duality: a dual sequence is found whose objective diverges. In particular, if we have an LP:

max cx: Ax = b, x >= 0,
a certificate that this is infeasible is that we found a sequence {yk} such that ykA >= c for all k and {yk b} ––> – infinity.

Also, see the supplement, Alternative Systems.

Chance constraint. See stochastic program.

Character of solution. The idea is to define character by solution status. In linear [integer] programming, columns might be partitioned into


Rows might be partitioned into
  • inactive vs. active
  • non-binding vs. binding
  • positive dual price in some optimal solution vs. zero dual price in every optimal solution

Only some rows and columns might be partitioned, and others are free to change. The character is the property we want to fix by the partition. For example, if the LP has activities for operating a plant, we might want to specify the plant operation as a character of the solution, never mind specific levels (or maybe some minimal throughput is specified).

Then, one conducts sensitivity analysis by asking for preservation of the character. Classical LP sensitivity analysis does this by preserving the optimality of the basis; interior point analysis seeks to preserve the optimality of the partition. Both are special cases, and one might want to focus on only a portion of the LP. Here are some properties of interest:

  1. If the character holds at [c', b'] and at [c", b"], it holds for all [c, b] in their line segment.
  2. The stability region is a convex set.

This notion of character   extends naturally to combinatorial optimization, such as preserving a key portion of a TSP tour or job-order in a schedule.

Characteristic cone of a polyhedron, P. {y: x+y in P for all x in P}. If P = {x: Ax <= b}, its characteristic cone is {y: Ay <= 0}. Same as recession cone.

Chemical equilibrium problem. Min{cx + Sum_j{x_j log x_j}: x > 0, Sum_j{x_j} = M, Ax=b}, where the objective is the Gibbs free energy function for x_j = number of moles of species j, b_i = atomic weight of atom of type i, and A(i,j) = number of atoms of type i in one mole of species j. The equation, Ax=b, is mass balance. (Its significance is that it was an early application to show how an LP approach can apply to a nonlinear mathematical program.)

Chinese postman problem. A mailman starts from the post office, delivers the mail on his route, covering each street at least once, then returns to the post office. The problem is to find a traversal such that the total distance travelled is minimized. (Some streets are one-way, and the postman is driving.) Abstractly, the problem is to find a least-cost cycle in a graph that uses each edge and arc (forward direction only) at least once. This differs from the Travelling Salesman Problem in that the postman can traverse an edge more than once. A related problem is to determine whether the postman can cover his route within a prescribed distance (while starting and ending at his post office). If so, the cycle is produced. This is computationally equivalent to the minimization problem (in a complexity sense), and is sometimes taken as the problem definition.

Chinese Remainder Theorem. Let m1, ..., mk be relatively prime positive integers, and let b1, ..., bk be any integers. Then, there exists x such that

x equiv b1 (mod m1)
...
x equiv bk (mod mk)
and x is uniquely determined by m1...mk.

Cholesky factorization. Given an mxm symmetric matrix A, a lower triangular matrix, L, is obtained, called the Cholesky factor, such that A = LL'. This is particularly useful for solving linear systems, Ax=b, by using forward substitution, Ly=b, then backward substitution, L'x=y. The original algorithm assumes A is positive definite, but it can apply more generally. This is also called Cholesky decomposition.

Chvátal cut. A cutting plane of the form, Sum_j{F(A_j) x_j} <= F(b), where F is a Chvátal function.

Chvátal function. This class is defined recursively:

This arises in integer linear programming   in several ways ... see Gomory function.

Closed form solution. This is where one could express a solution (e.g., optimal) by an equation of the form:

x* = F(p),

where F is some function of the parameters, p. This is typically meant to suggest that no algorithm   is needed, but one must be careful since F is not limited in any way, such as being "easy"  to evaluate. For example, under mild assumptions, the coordinates of a global maximum of a function, f, on the domain X, is given by the following integral:

graphic

(The idea is to weight "mass" at the global maximum, and it is valid if the set of global maxima is convex.) This is not too useful, and the integral would typically have to be evaluated with a numerical method - i.e., an algorithm.

Closed function. In convex analysis, this is where its epigraph is a closed set. (If f is concave, it is its hypograph that must be closed.)

Closed map. Let A:X-->2^X and suppose {x^k}-->x and {y^k}-->y such that y^k is in A(x^k) for all k. Then, A is closed at x if y is in A(x). The map A is closed on a subset of X, say S, if A is closed at each x in S.

Closed set. One that contains its limit points: if {x^k}-->x and each x^k is in X, then x is in X. By the logic of the definition, an empty set is closed (it is also open). A finite number of intersections of closed sets is a closed set, so the feasibility region of a mathematical program in standard form is closed if each constraint function is continuous. (g can be only upper semi-continuous for the constraint, g(x) <= 0.) An example of a set that is not closed is: {x: x < 0}. The point 0 can be approached by a sequence, say x^k = 1/k, but 0 is not in the set. The closure of a set is the union of the set and all of its limit points.

Closure condition. That the closure of a nonempty strict interior equal the level set: cl{x in X: g(x) < 0} = {x in X: g(x) <= 0}. Here is an example where the closure condition fails:

                 _
                | -1-x        if       x < -1
                |  0          if -1 <= x <  0
         g(x) = | 
                | -1+(x-1)^2  if  0 <= x <  2
                |_ x-2        if  2 <= x        
Note that g is continuous and quasiconvex. Its level set is {x: g(x) <= 0} = [-1, 2], but the strict interior, is (0, 2), so its closure is only [0, 2]. We lose the flat portion, [-1, 0).

This is important in the use of interior methods and in stability. Here are some functions that satisfy the closure condition:

When equality constraints are present, there are two forms of extension of the closure condition: to consider the relative strict interior, and to consider "feasible sequences". The first generally assumes h is affine, so the closure condition becomes: cl{x: Ax=b, g(x) < 0} = {x: Ax=b, g(x) <= 0}. The second defines a family of maps:

I_i- = {x in X: g(x) < 0, h_k(x) = 0 for k not= i, h_i(x) < 0}. I_i+ = {x in X: g(x) < 0, h_k(x) = 0 for k not= i, h_i(x) > 0}.

Then, the closure condition is that I_i- and I_i+ are not empty, and

cl(I_i- ) = {x in X: g(x) <= 0, h_k(x) = 0 for k not= i, h_i(x) <= 0} cl(I_i+) = {x in X: g(x) <= 0, h_k(x) = 0 for k not= i, h_i(x) >= 0}

for all i.

Coercive function.  f:Rn-->Rn is coercive with respect to X (a subset of Rn) if there exists a vector x0 in X such that

limx in X, ||x||-> inf |<f(x)-f(x0), x - x0>| = inf,
where any norm can be used. This arises in
variational inequalities (and complementarity problems).

Some people use a different definition, where f:Rn->R (i.e., a scalar, real-valued function):

|f(x)| / ||x|| -> inf
as ||x|| -> inf. Note that the two definitions differ, even for n=1. For example, f(x)=|x| is coercive under the first definition, and it is not under the second. For the bilinear function, f(x,y)=x'Ay for some square matrix A, f is coercive if there exists a > 0 such that

y'Ay >= a||y||2

Column generation. This pertains to solving a linear program whose columns are generated during pricing. Typically, the number of columns is astronomically large, possibly infinite. An example is when solving the randomized program, as with the Generalized Lagrange Multiplier method. In that case, column generation consists of maximizing the Lagrangian. A similar view applies to Dantzig-Wolfe decomposition. From the dual view, this is a cutting plane method since generating a column in the primal corresponds to generating a constraint in the dual.

The concept applies to any mathematical program, and the randomized program model hightlights the duality and how a completely general mathematical program can be considered with (generalized) linear programming.

Combinatorial program. Let N = {1, ..., n} and consider a finite collection of subsets, say {S1, S2, ..., Sm}. For each subset there is an objective function value, f(Sk), and the problem is to optimize f(Sk). Typically, the feasible subsets are represented by inclusion or exclusion of members such that they satisfy certain conditions. This then becomes a special class of integer programs whose decision variables are binary valued: x(i,k)=1 if the i-th element is in Sk; otherwise, x(i,k)=0. (IP formulations are not always easy, and often there is more than one formulation, some better than others.)

Also see integer program.

Compact formulation. In integer programming, this refers to having a polynomial number of constraints. For example, look at the travelling salesman formulations. The linear form has an exponential number of "subtour elimination consgtraints," so it is not compact. The quadratic assignment formulation is compact.

Compact set. This has a general meaning in mathematical analysis. For mathematical programming, where we work in finite-dimensional Euclidean vector spaces, it means the set is closed and bounded. This is often an assumption about the feasibility region in order to ensure the existence of an optimum value for a continuous objective function – see Weierstrass' theorem. Any finite set is compact. An open interval (a,b) is not compact because its endpoints, a and b are limit points but are excluded from the set.

Compatibility theory. The idea that a solution's character does not change for a particular perturbation. In linear programming the character could be an optimal basis, and the theory is concerned with whether a particular basis remains optimal when the data is changed in a prescribed direction. A Fundamental Theorem of Basis Compatibility is the following:

h is an admissible direction for perturbing (b, c) if, and only if, it is compatible with some equilibrium basis.
The range of compatiblity of a basis, B, for a direction, h, is the greatest step for which B remains optimal: sup{t: B is optimal for the LP defined by r + th}. The basis spectrum   is the greatest range: sup{range(B; h): B is optimal for the LP defined by r}.

Complementarity condition. Satisfying complementary slackness.

Complementarity problem (CP). Let F:Rn–>Rn. Find z >= 0 such that F(z) >= 0 and F(z)z'=0. It is complementary because every solution has the property that either z_j=0 or F_j(z)=0 (or both) for each j. The linear complementarity problem (LCP) is when F(z)=Mz+q.

The problem generalizes to allow bounds on z: L <= z <= U. Then, F(z) is required to satisfy:

F_j(z) >= 0 if z_j = L_j

F_j(z) <= 0 if z_j = U_j

F_j(z) = 0 if L_j < z_j < U_j

This problem is regarded by some as The Fundamental Problem because it includes not only all mathematical programs that must satisfy the Kuhn-Tucker conditions, but also some other economic equilibrium problems. (Also see the variational inequality problem.)

Complementary slackness. The condition that two non-negative vectors are orthogonal. It arises in the Kuhn-Tucker conditions, where the Lagrange multiplier, say y, must be orthogonal to the (inequality) constraint functional value: y'g(x)=0. This means either y_i=0 or g_i(x)=0 for each i -- that is, if a constraint is not active, its Lagrange multiplier must be zero.

Complementary variables. If u and v are the two vectors that satisfy complementary slackness, each pair, u_i and v_i are called complementary variables. In linear programming, primal levels and dual prices are complementary variables.

Complexity. A measure of computer time or space to solve a problem by an algorithm as a function of the problem's dimensions. Suppose T(n) is the time it takes to solve an instance of the problem with dimension n. Then, the algorithm has (worst-case) time complexity K(n), if the greatest time it could take to solve an instance of the problem is O(K(n)). When K(n) is a polynomial, we say the algorithm has polynomial time complexity. The Klee-Minty polytope shows that the elementary simplex method does not have polynomial time complexity.

The average time complexity is the average (rather than worst) time of an algorithm (or class of algorithms) over some class of problems, for which some probability distribution is assumed. Absent a modifier, the complexity of an algorithm is taken to mean its worst-case time complexity.

Whereas complexity refers to the performance of an algorithm, see the notion of NP-completeness for the related meaning of problem complexity. The standard notation to describe complexity in terms of problem dimensions, say n, is O(K(n)). This ``big-O'' notation means the following: a function, f:Z+-->R, is O(K(n)) if there exists a constant, c, and N in Z+, such that f(n) <= cK(n) for all n>=N. For example, if an algorithm requires 5n^3 + 2n + 10 fundamental operations on a problem of size n, its time complexity is O(n^3). Another definition also describes the ``Omega'' and ``theta'' functional forms, which you can see with Examples and a full Supplement.

Complicating variables. Those variables which, when fixed at particular values, render the remaining mathematical program relatively easy to solve. One example is the use of binary variables for fixed charges in an otherwise linear program. More generally, see Benders' decomposition.

Component. In a graph (or network), this is a maximal connected subgraph. (The number of components is 1 if the graph is connected.)

Composite concave program. A global optimization problem whose objective is a composite concave function.

Composite function. A function of a function, of the form F(f(x)).

Concave function. Negative is convex. Equivalently, its hypograph is convex. Equivalently, the domain of f (say X) is a convex set, and for x, y in X and a in [0, 1]:

f(ax + (1-a)y) >= af(x) + (1-a)f(y).

Condition number (of a matrix). This is ||A|| ||A–1|| when A is nonsingular and ||*|| is some matrix norm. This arises in convergence analysis, where A is the hessian. Whenever A is symmetric and positive definite (as the hessian of a strictly convex function), and the matrix norm is that induced by the Euclidean norm (i.e., ||A|| := Max{||Ay||: ||y||=1}), the condition number is the ratio of the extreme values of the eigenvalues. It is often used to measure the convergence rate of an ascent algorithm, due to Kantorovich's inequality. (Unfortunately, it is erroneously used to measure the goodness of an algorithm – see Myth NLP-2.)

Cone. A set, C, with the property that if x is in C, then ax is in C for all a in R+. A convex cone is a cone that is also a convex set. Equivalently, C is a convex cone if C = C + C. (An example of a cone that is not convex is the union of the axes.) A polyhedral cone is a cone that is also a polyhedron; equivalently, C is a polyhedral cone if there exists a matrix A such that C = {x: Ax <= 0}. (An example of a cone that is not polyhedral is {(x,y,z): x2+y2–z2 =0}.)

A quadratic cone is of the form {x: x'Qx <= 0}, where Q is any (real) matrix. If Q is negative semi-definite, the cone is all of Rn space. If Q is positive definite, the cone is just the origin, {0}. So, quadratic cones usually arise when Q is indefinite. Example: {x: x >= 0, 2x1x2 >= ||(x3,...,xn)||2). See each of the following special cones:

Cone of optimality. Given an LP in standard form, let B be an optimal basis (for which the dual is feasible). Then, the cone of optimality of B is {b: B-1b >= 0}. It is so called because B remains an optimal basis for any b in this cone.

Conic program. Standard form: min{ cx: Ax=b, x in K}, where K is a cone (not necessarily convex). This is more general than it looks. Suppose S is any non-empty set, and we have the mathematical program: min{ cx: x in S}. Then, define K={t(1,x): x in S and t >= 0}. The above problem is equivalent to the conic program:

min{ cx: (x0, x) in K and x0=1}.
This problem class also includes the semi-definite program, as x could be a matrix.

Conjunctive Normal Form (CNF). A logical expression that is written as a conjunction, C1 /\ C2 /\ ... /\ Cm, where /\ is the conjunction (logical and). Each Ci is called a clause and has the form Li1 \/ Li2 \/ ... \/ Lin, where \/ is the disjunction (logical inclusive or) and each Li is a literal – a proposition or its negation.

Conjugate directions. Direction vectors chosen in an algorithm that are conjugate with previous directions, with respect to the hessian of the objective. For the unconstrained quadratic, x'Qx + cx (where Q is symmetric), direction vector d is conjugate if d'Qd^k = 0 for all previous direction vectors, {d^k}.

Conjugate duality. See it in the list of particular duals.

Conjugate function. The convex conjugate of f on X, denoted f* on X*, is the greatest convex approximation from below:

and X* = {x*: f*(x*) < infinity} (= effective domain of f*). The concave conjugate of f on X, denoted f^ on X^, is the least concave approximation from above:

f^(x*) = Inf{ x*x' - f(x): x in X},

and X^ = {x*: f^(x*) > -infinity}. This is a foundation for Lagrangian duality, viewed in response space.

Conjugate gradient method. An ascent method for unconstrained optimization such that successive directions are conjugate with respect to the hessian for a quadratic objective function.

Conjugate vectors. With respect to a symmetric matrix, A, u and v are conjugate if u'Av = 0. (In particular, if A=I, u and v are conjugate if they are orthogonal.)

Connected network. In the undirected case, a graph is connected if, for any pair of nodes, there is a path that contains both of them. In a directed graph, or network, the path may be required to be directed (i.e., follow the orientations of the arcs), in which case the network is strongly connected; or, the path may be allowed to ignore the arc orientations, in which case the network is weakly connected.

Consistent. Pertains to a system of equations and inequalities, for which there exists a feasible solution.

Constraint. Usually this is a relation of the form of an inequality, g(x) <= 0, or equation, h(x)=0. More generally, it can be any restriction the decision variables must satisfy. For example, some regard "x must be integer-valued" as a constraint, while others would say that this is simply the domain of the decision variables in the mathematical program. There are other relations, such as the logical form of a precedence constraint: IF x=0 THEN y=0.

Some refer to an equilibrium constraint as one that solves a complementarity problem, satisfies a variational inequality, or a generalized equation. An example of a mathematical program with an equilibrium constraint (MPEC) is:

min f(x, y): x in X,   and   (y >= 0,  F(y; x) >= 0,  y'F(y; x) = 0),

where F(y; x) is read as a function of y, given x. The last set of constraints can also be stated as:

y solves the complementarity problem induced by F.

Constraint qualification. Conditions on the constraint functions (g and h) that are sufficient to make the Lagrange Multiplier Rule valid. Here is an example to illustrate what can go wrong: Max x: x^2 <= 0. Since x=0 is the only feasible solution, it is optimal. The Lagrange Multiplier Rule requires that there exist u for which f' - ug' = 0, but f' = 1 and g' = 0, so no such u can exist. Slater used this example in illustrating his interiority condition. The classical qualification, given by Lagrange's multiplier theorem without inequality constraints, is that grad_h(x) have full row rank, which stems from the Implicit Function Theorem. Another constraint qualification is that all constraint functions be affine (even with redundant constraints). Each constraint qualification gives a sufficient condition for the LMR to be valid. A constraint qualification is necessary if it must hold in order to guarantee that the LMR is valid for all f in C^1 having optimal value at x.

Continuous Program. A mathematical program with continuous variables. Moreover, the term is commonly used to further imply that the objective and constraint funtions are continuous, an assumption that allows certain analytical results. For example, Weierstrass first proved that continuous functions attain both a maximum and a minimum over a compact set. If the problem contains both continuous and integer variables, then the problem is said to be a mixed integer program.

Contour (or isoquant) of a function, f. The projection of its graph into its domain: {x in X: f(x) = c}, where c is the contour value.

Contraction map (or contractor). One that brings points closer together, forming the foundation for convergence by successive approximation. Mathematically, we have f:X-->X, where X is in a normed space, with norm of x denoted by ||x||. f is a contractor if there exists a constant K in [0,1) such that ||f(x)-f(y)|| <= K||x-y|| for all x and y in X. (See Banach's fixed point theorem.)

Convergence (of an algorithm). The algorithm is represented as the point-to-set map, x' in A(x), where there is some selection function to choose x' when A(x) has more than one member. Convergence means that the sequence, {x^k}, has a limit point, say x, such that x satisfies certain conditions. Ideally, the conditions are that x is an optimal solution, at least locally, but this is often not the definition used in a convergence theorem. (For example, in unconstrained optimization, most algorithms converge to a stationary point, which need not be even a local optimum.)

Let {s^k}-->0, where s^k is a scalar series pertaining to the series {x^k}. Typically, s^k = ||x^k - x||, which is sometimes called policy convergence (to a solution, x). We could also have s^k = f(x^k) - f*, where f is the objective function, in which case the concern is with value convergence to an optimal value, f*.

    Related terms and concepts
  • Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
  • Geometric convergence. Same as linear convergence, but usually used when the sequence is exactly a geometric progression: s^k = r^k(s^0).
  • Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
  • Globally convergent. Convergence to a solution from any starting point.
  • Linear convergence. Order = 1 and rate < 1.
  • Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions – see Myth NLP-5 to avoid misconception).
  • Locally convergent. There exists an open neighborhood of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
  • Order of convergence. Sup{p: lim ||s^(k+1)|| / ||s^k||^p < inf}. Order=1 is linear and order=2 is quadratic convergence. Most (non-finite) algorithms to solve mathematical programs are between linear and quadratic.

    Let v^k = ||s^k|| and suppose v^0 = 1 (for notational convenience). The term order derives from the approximate equation, v^(k+1) := r(v^k)^p, where r is the rate. If this equation were exact, we would have v^k = r^k if p=1, and v^k = r^[(p^k-1)/(p-1)] if p > 1, for all k. If r=.1 and p=1, we gain one decimal place each time: v^1 = .1, v^2 = .01, v^3 = .001, etc. If p=2, the number of accurate decimal places approximately doubles each iteration: v^1 = .1, v^2 = .0001, v^3 = .0000001, etc. Unfortunately, the underlying equation is not exact – see NLP Myth-2 to avoid misconception. Some authors call this q-order (or Q-order) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.

  • Rate of convergence. This is generally used to mean the limiting ratio: lim ||s^(k+1)|| / |s^k||^p, given the order is p.
  • Sublinear convergence. Order = 1 and rate = 1 (slower than all linear convergence) -- e.g., s^k = 1/k.
  • Superlinear convergence. Order = 1 and rate = 0 (faster than all linear convergence) -- e.g., s^k = (1/k)^k.
  • Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables, {X_n}-->X.

    • with probability 1 or almost everywhere (abbr., a.e.). P{lim X_n = X} = 1.
    • in probability. P{||X_n - X|| > e}-->0 for all e > 0.
    • in distribution. The sequence of cumulative distribution functions (cdf), converges point-wise: F_n(x)-->F(x) for all x at which F is continuous, where F_n is the cdf of X_n and F is the cdf of X.
    • in p-th mean. E{||X_n - X||^p}-->0. For p=2, this is called convergence in quadratic mean or in mean square.

Convex combination. An affine combination of vectors whose coefficients are non-negative – i.e., Sum_k{a_k x^k} where a >= 0 and Sum_k{a_k} = 1.

Convex cost flow problem. The min-cost network flow problem with a nonlinear convex cost function.

Convex function. Let f:X––>R. Then, its epigraph is convex. Equivalently, X is a convex set, and for x, y in X and a in [0, 1]:

f(ax + (1-a)y) <= af(x) + (1-a)f(y).

Convex hull (of a set). The intersection of all convex supersets (which can be limited to halfspaces). Equivalently, the set of all convex combinations of points in the set (which can be limited to convex combinations of at most n+1 points, in n dimensions, which is known as Carathéodory's Theorem).

Convex program. f is concave (convex for minimization), g is convex, and h is affine on X. See the supplement, Convex Cones, Sets, and Functions.

Convex set. If any two points are in the set, so is their line segment. (See Myth NLP-7 to avoid misconception.)

Convex simplex method. This extends the general strategy of the simplex method in linear programming to maximize a concave objective over linear constraints of the form Ax=b and x >= 0. (A form of local convergence applies when the objective is not concave, but is in C^1.) A tableau is maintained, but nonbasic variables need not be at zero level. The partition is used to compute a generalized reduced cost of the form d(x) = grad_f(x) - yA. The y vector is determined by the basic portion: 0 = grad_B[f(x)] - yB (so y = grad_B[f(x)]B–1).

As in the simplex method, pricing is used to determine whether there is an improving nonbasic vector. If not, the algorithm terminates (satisfying the Kuhn-Tucker conditions); otherwise, a line search is used to determine the new point, changing only the basic variables to compensate for the change in the one nonbasic level. If this causes a basic variable to reach zero, the basis is changed with the pivot operation.

Convexity cut. A class of cutting planes derived by considering a convex set in a polyhedron, P. The form of the cut is

Sum_i{y_i / t_i} >= 1,

and it is derived as follows.

Let x0 be an extreme point of a given polyhedron, which contains a given set, S. Suppose C is a convex set in R^n whose interior contains x0 but does not contain any point of S. Let v1, ..., vn be linearly independent vectors in R^n, and let {t_i} be such that t_i > 0 and [x0, x0 + t_i vi] is in C for all i=1, ..., n (e.g., the edges emanating from x0 in V). Define V = [v1, ..., vn] and M = [diag(t_i)V]–1 (i.e., M(i,.) = V–1/t_i). Then, the cutting plane {x: M(x-x0) >= 1} excludes x0 without excluding any other x in S for which V–1(x-x0) >= 0. The cut, M(x-x0) >= 1, is equivalent to the above form, y diag(1/t_i) >= 1, where y = x0 - Vw for some w >= 0}.

One special case is the intersection cut for a 0-1 integer program:

S = {(x, s): x in {0, 1}^n, s in Z^m, Ax + s = b};

P = {(x, s): x in [0, 1]^n, s in Z^m, Ax + s = b} /\ {x: Cx <= c}, where {Cx <= c} are the cuts already added;

x0 is an extreme point of P, obtained by solving the LP relaxation with a simplex method, so x0 = (u, w) is a basic optimum.

V = B–1N, where B is the basis that transforms the original equations to u + B–1Nw = x0 (= B–1b);

C is a sphere, {y: ||y||^2 <= 1/4}.

Another special case is the concavity cut for minimizing a concave function over a polyhedron, P:

S = P;

x0 is an extreme point of P that is a local maximum;

C = {t in R^n+: x0 - Vt is in P and f(x0 - Vt) >= f(x0)-e}, where x* is the best value of f found so far and e > 0 is an optimality tolerance.

Corner polyhedron problem. Gomory's relaxed IP that drops the non-negativity constraints of basic variables. Suppose x = (u,0) is a basic solution to the LP, Opt{cx: Ax=b, x >= 0}, where u is the vector of basic variables. Then, the corner polyhedron problem is

Opt{dv: u in Z^m, v in Z^(n-m)+, Bu + Nv =b}

(dropping u >= 0), where d = reduced cost of v in the LP solution. (Note: cx=dv for all x=(u,v) such that Ax=b.)

Correlation matrix. See elliptope.

Cost of capital. Used in computing an objective function, it is the interest rate. The units are currency per time period (e.g., dollars per year). One must be careful in computing this. For example, the marginal cost of capital can be greater than the marginal interest rate because the latter applies to the entire borrowing, not just the last dollar borrowed. (This is the "classical" definition, which does not consider advances in option theory.)

Covering problem. The idea is to select enough members in each of a specified collection of sets; that defines covering the sets. Subject to this, there is a cost for the elements selected, and the objective is to minimize total cost. The IP form of the usual set covering problem is Min{cx: Ax >= 1, x in {0,1}^n}, where x_j=1 if element j is selected; else, x_j=0. The matrix A has 0's and 1's, where the i-th row corresponds to the i-th set to be covered: A(i, j)=1 means element j is in set i; else, A(i, j)=0. The constraint Ax >= 1 means that at least one element of each set must be selected. This could be extended to require b_i elements of set i be selected by writing Ax >= b. Also see vertex cover.

CPLEX. A collection of mathematical programming software solvers.

Cramer's rule. To solve Ax=b, where A is nonsingular, the j-th coordinate is given by:

              det(A^j)
       x(j) = --------, 
               det(A)  
where det(.) is the determinant, and A^j is the matrix obtained when column j of A is replaced by b:

A^j = [A(., 1) A(., 2) ... A(., j-1) b A(., j+1), ... A(., n)]

Crash. This is a heuristic to generate an initial point for an algorithm. It is from the early linear programming computer systems that used a variety of heuristics to generate an initial basic solution.

Criss-cross method. A method in linear programming that chooses a pivot by possibly crossing from primal to dual, and vica versa. The least-index criss-cross method is a finite pivot method that chooses a pivot based on the least index of a column or row for which there is improvement in either the primal or dual infeasibility. It terminates at a basis when one of the following termination conditions is reached:

  1. Both the associated primal and dual solutions are feasible (implies optimality because complementary slackness holds by construction).
  2. There is evidence of primal and/or dual infeasibility (like the tests in the simplex method).
(Note: any basis is used to start and there is only one phase.)

Critical path. A longest path in a network, where length units are time durations. The nodes represent tasks, and the arcs represent precedence constraints. The path is critical because the associated tasks determine the total completion time of the project. Moreover, at least one of their duration times must decrease in order to decrease the total completion time.

Critical point (of a function in C^1). Where the first partial derivatives are zero or infinite.

Crossover operation. See genetic algorithm.

Cut search. An algorithm strategy for global optimization (notably for integer programming) that consists of two alternating phases: search and cut. The search phase finds linearly independent vectors emanating from a root point, x0, to setup a probe for the cut phase. Usually, x0 is an extreme point, so the search is an edge probe phase that extends edges of the cone rooted at x0 until it intersects the candidate solution points. Then, a cutting plane is added (usually a convexity cut) to exclude the root point. (A new root point is obtained by solving the new approximating problem on the smaller polyhedron.)

Cutset, In a weakly connected network, this is a set of arcs whose removal decomposes the network into exactly two components. Separating node s from node t, the value of the cutset is the sum of the capacities of its arcs from the component containing s to the one containing t.

Cutting plane. A hyperplane whose halfspace cuts off a particular point, such as a non-integer extreme point solution of a linear program relaxation of an integer program. The cut is such that it does not eliminate any solution to the original problem. For example, suppose we want to solve Max{x: (x,y) in {0,1}^2, 2x + 2y <= 1}. The only feasible point is (0, 0). The LP relaxation is Max{x: (x,y) in [0,1]^2, 2x + 2y <= 1}, and an optimal solution is at (1/2, 0). One cutting plane is {(x,y): x + y <= 1/3}. A deeper one is {(x,y): x + y <= 0}. (When adding a cutting plane to the LP, a new relaxation is obtained, and the solution is closer to the IP solution.)

Cutting stock problem. Determine a way to cut standard sheets of material into various shapes (like clothes parts) to minimize waste. This is a (linear) integer programming model: patterns are specified, and A(i, j, k) = amount of i-th stock (e.g., sheet or roll of material) used to generate the j-th output by the k-th pattern. Then, let x(k) = level of k-th pattern used and y = Ax. Thus, s(i) = Sum_j{y(i, j)} = amount of the i-th stock used, which is limited by its availability: s <= S; and v(j) = Sum_i{y(i, j)} = amount of j-th output generated, which is required to be in some range, say L <= v <= U (allowing some demand overruns or underruns). Some models seek to minimize the total waste: Sum_i{S(i) - s(i)}. Other models consider cost too. The most common problems are 2-dimensional (odd shapes from sheets of material); the 1-dimensional case is called the trim problem. In the latter case, the stock index (i) is not needed. For example, consider a stock of rolls of paper with a given width, which must be slit into rolls of various widths. Then, A(j, k) = how much of a stock roll is used by the k-th pattern to generate a roll of the j-th width. Moreover, y(j) = Sum_k{A(j, k)x(k)} = amount of a stock roll used by pattern j to generate a roll of width j.

Cyclic descent. An algorithm that seeks to optimize a multivariate function by optimizing each coordinate with a univariate optimization technique (keeping the other coordinates fixed). This is repeated until a fixed point is reached. In general, it is possible for such a fixed point not to be an optimum (even locally) because a simultaneous change in variables could result in an improvement. An example is given by:

f(x, y) = (y - x^2)(y - 2x^2) at (x, y) = (0, 0).

f(0, y) has a minimum at y=0, and f(x, 0) has a minimum at x=0. However, f decreases with simultaneous change, y=(3/2)x^2. Along this parabola, f(x,y) = -(x^4)/4, which is negative for x nonzero. Thus, 0 is not a local minimum of f. (For further discussion about this example, see Myth NLP-5.)

Cycling (in linear programming). Revisiting a basis, mainly referring to a simplex method, so that the algorithm would cycle through the same sequence of bases, not converging to an optimal one. (Example.)



Notation

Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2008, Mathematical Programming Glossary, by the INFORMS Computing Society