Conjugate function. The convex conjugate of f on X,
denoted f* on X*, is the greatest convex approximation from below:
f*(x*) = Sup{ x*x' - f(x): x in X},
and X* = {x*: f*(x*) < infinity} (=
effective domain of f*). The concave conjugate of f on X,
denoted f^ on X^, is the least concave approximation from above:
f^(x*) = Inf{ x*x' - f(x): x in X},
and X^ = {x*: f^(x*) > -infinity}. This is a foundation for
Lagrangian duality, viewed in
response space.
Conjugate gradient method. An ascent method for unconstrained
optimization such that successive directions are
conjugate with respect to the
hessian for a quadratic objective function.
Conjugate vectors. With respect to a symmetric matrix, A, u and v
are conjugate if u'Av = 0. (In particular, if A=I, u and v are conjugate
if they are orthogonal.)
Connected network. In the undirected case, a
graph is connected if, for any pair of nodes,
there is a path that contains both of them. In a directed graph, or
network, the path may be required to be
directed (i.e., follow the orientations of the arcs), in which case the network
is strongly connected; or, the path may be allowed to ignore the
arc orientations, in which case the network is weakly connected.
Consistent. Pertains to a system of equations and
inequalities, for which there exists a
feasible solution.
Constraint. Usually this is a relation of the form of an
inequality, g(x) <= 0, or equation, h(x)=0. More generally,
it can be any restriction the decision variables must satisfy.
For example, some regard "x must be integer-valued"
as a constraint, while others would say that this is simply the
domain of the decision variables in the mathematical program.
There are other relations, such as the logical form of a
precedence constraint:
IF x=0 THEN y=0.
Some refer to an equilibrium constraint as one that solves a
complementarity problem,
satisfies a
variational inequality,
or a generalized equation.
An example of a mathematical program with an equilibrium constraint
(MPEC) is:
min f(x, y): x in X, and
(y >= 0, F(y; x) >= 0, y'F(y; x) = 0),
where F(y; x) is read as a function of y, given x.
The last set of constraints can also be stated as:
y solves the complementarity problem induced by F.
Constraint qualification. Conditions on the constraint functions
(g and h) that are sufficient to make the
Lagrange Multiplier Rule valid. Here is an example to
illustrate what can go wrong: Max x: x^2 <= 0. Since x=0 is
the only feasible solution, it is optimal. The Lagrange Multiplier
Rule requires that there exist u for which f' - ug' = 0,
but f' = 1 and g' = 0, so no such u can exist.
Slater used this example in illustrating his
interiority condition.
The classical qualification, given by
Lagrange's multiplier
theorem without inequality constraints, is that grad_h(x) have
full row rank, which stems from the
Implicit Function Theorem.
Another constraint qualification is that all constraint functions be
affine (even with
redundant constraints).
Each constraint qualification gives a sufficient condition for the LMR
to be valid. A constraint qualification is necessary if
it must hold in order to guarantee that the LMR is valid for all f in
C^1 having optimal value at x.
Continuous Program.
A mathematical
program with continuous variables. Moreover, the term is commonly used
to further imply that the
objective and
constraint funtions are
continuous, an assumption that allows certain analytical results. For
example,
Weierstrass first proved that continuous functions attain
both a maximum and a minimum over a
compact set.
If the problem contains both continuous and integer variables, then
the problem is said to be a
mixed integer program.
Contour (or isoquant) of a function, f. The projection of its
graph into its
domain: {x in X: f(x) = c}, where c is the contour value.
Contraction map (or contractor).
One that brings points closer together, forming the foundation
for convergence by
successive approximation.
Mathematically, we have f:X-->X, where X is in a
normed space, with norm of x denoted by ||x||.
f is a contractor if there exists a constant K in [0,1) such that
||f(x)-f(y)|| <= K||x-y|| for all x and y in X. (See
Banach's fixed point theorem.)
Convergence (of an algorithm). The algorithm is represented as the
point-to-set map, x' in A(x), where there is some
selection function to choose x' when A(x) has more than
one member. Convergence means that the sequence, {x^k}, has a limit
point, say x, such that x satisfies certain conditions. Ideally, the
conditions are that x is an
optimal solution, at least locally, but this is often not the
definition used in a convergence theorem. (For example, in
unconstrained optimization, most algorithms converge to a
stationary point, which need not be
even a local optimum.)
Let {s^k}-->0, where s^k is a scalar series pertaining to the series {x^k}.
Typically, s^k = ||x^k - x||, which is sometimes called policy convergence
(to a solution, x). We could also have s^k = f(x^k) - f*, where f is the
objective function, in which case the
concern is with value convergence to an optimal value, f*.
Related terms and concepts
- Dual convergence.
Dual values, notably
Lagrange multipliers, converge to a dual solution.
- Geometric convergence. Same as linear convergence,
but usually used when the sequence is exactly a geometric
progression: s^k = r^k(s^0).
- Global convergence. Usually means the same as globally
convergent, but some mean that the algorithm convergences to a
global solution.
- Globally convergent. Convergence to a solution
from any starting point.
- Linear convergence. Order = 1 and
rate < 1.
- Local convergence. Some mean locally convergent, but
some mean that the limit point is a local optimum (or just
satisfies necessary conditions – see
Myth NLP-5 to avoid
misconception).
- Locally convergent. There exists an open neighborhood
of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
- Order of convergence.
Sup{p: lim ||s^(k+1)|| / ||s^k||^p < inf}.
Order=1 is linear and order=2 is quadratic
convergence.
Most (non-finite) algorithms to solve mathematical programs are
between linear and quadratic.
Let v^k = ||s^k|| and suppose v^0 = 1 (for notational convenience).
The term order derives from the approximate equation,
v^(k+1) := r(v^k)^p, where r is the
rate. If
this equation were exact, we would have v^k = r^k if p=1, and
v^k = r^[(p^k-1)/(p-1)] if p > 1, for all k.
If r=.1 and p=1, we gain one decimal place each
time: v^1 = .1, v^2 = .01, v^3 = .001,
etc. If p=2, the number of accurate decimal places approximately doubles
each iteration:
v^1 = .1, v^2 = .0001, v^3 = .0000001,
etc. Unfortunately, the underlying equation is
not exact – see
NLP Myth-2 to avoid misconception.
Some authors call this q-order (or Q-order)
convergence to distinguish it from variations of the definition
of order. Each definition is designed to capture the notion
of how many digits of accuracy are added as the sequence
approaches its limit.
- Rate of convergence. This is generally used to mean the
limiting ratio: lim ||s^(k+1)|| / |s^k||^p,
given the order is p.
- Sublinear convergence. Order = 1 and rate = 1
(slower than all linear convergence) -- e.g., s^k = 1/k.
- Superlinear convergence. Order = 1 and rate = 0
(faster than all linear convergence) -- e.g., s^k = (1/k)^k.
- Stochastic convergence. This applies when the successor
point is a random variable, as in
simulated annealing. Here are the most common types of
convergence for a sequence of random variables, {X_n}-->X.
- with probability 1 or almost everywhere
(abbr., a.e.). P{lim X_n = X} = 1.
- in probability.
P{||X_n - X|| > e}-->0 for all e > 0.
- in distribution. The sequence of cumulative
distribution functions (cdf), converges point-wise:
F_n(x)-->F(x) for all x at which F is continuous, where
F_n is the cdf of X_n and F is the cdf of X.
- in p-th mean. E{||X_n - X||^p}-->0. For p=2,
this is called convergence in quadratic mean or
in mean square.
Convex combination. An
affine combination of vectors whose coefficients are non-negative
– i.e., Sum_k{a_k x^k} where a >= 0 and Sum_k{a_k} = 1.
Convex cost flow problem. The min-cost
network flow problem with a nonlinear convex cost function.
Convex function. Let f:X––>R.
Then, its epigraph is convex.
Equivalently, X is a convex set, and for x, y in X and a in [0, 1]:
f(ax + (1-a)y) <= af(x) + (1-a)f(y).
Convex hull (of a set). The intersection of all
convex supersets (which can be limited to
halfspaces). Equivalently, the set of all
convex combinations of
points in the set (which can be limited to convex combinations of
at most n+1 points, in n dimensions, which is known as
Carathéodory's Theorem).
Convex program. f is concave
(convex for minimization), g is convex,
and h is affine on X.
See the supplement,
Convex Cones, Sets, and Functions.
Convex set. If any two points are in the set, so is their
line segment. (See
Myth NLP-7 to avoid misconception.)
Convex simplex method. This extends the general strategy of the
simplex method in linear programming to maximize a
concave
objective over linear constraints of the form Ax=b and x >= 0.
(A form of local convergence applies when
the objective is not concave, but is in C^1.) A
tableau is maintained, but nonbasic variables need not be at zero
level. The partition is used to compute a generalized
reduced cost of the form d(x) = grad_f(x) - yA.
The y vector is determined by the
basic portion: 0 = grad_B[f(x)] - yB (so y = grad_B[f(x)]B–1).
As in the simplex method, pricing is used to determine whether there
is an improving nonbasic vector. If not, the algorithm terminates
(satisfying the
Kuhn-Tucker conditions); otherwise, a
line search is used to determine the new point, changing only
the basic variables to compensate for the change in the one nonbasic
level. If this causes a basic variable to reach zero, the basis is
changed with the pivot operation.
Convexity cut. A class of
cutting planes derived by considering a
convex set in a
polyhedron, P. The form of the cut is
and it is derived as follows.
Let x0 be an
extreme point of a given
polyhedron, which contains a given set, S.
Suppose C is a convex set in R^n whose interior contains x0 but
does not contain any point of S.
Let v1, ..., vn be linearly independent vectors in R^n, and let
{t_i} be such that t_i > 0 and [x0, x0 + t_i vi] is in C for all
i=1, ..., n (e.g., the edges emanating from x0 in V).
Define V = [v1, ..., vn] and M = [diag(t_i)V]–1
(i.e., M(i,.) = V–1/t_i). Then, the cutting plane
{x: M(x-x0) >= 1} excludes x0 without excluding any other x in S
for which V–1(x-x0) >= 0. The cut, M(x-x0) >= 1, is
equivalent to the above form, y diag(1/t_i) >= 1, where
y = x0 - Vw for some w >= 0}.
One special case is the intersection cut for a 0-1
integer program:
S = {(x, s): x in {0, 1}^n, s in Z^m, Ax + s = b};
P = {(x, s): x in [0, 1]^n, s in Z^m, Ax + s = b} /\ {x: Cx <= c},
where {Cx <= c} are the cuts already added;
x0 is an extreme point of P, obtained by solving the
LP relaxation with a
simplex method, so x0 = (u, w) is a
basic optimum.
V = B–1N, where B is the basis that transforms the original
equations to u + B–1Nw = x0 (= B–1b);
C is a sphere, {y: ||y||^2 <= 1/4}.
Another special case is the concavity cut for minimizing a
concave function over a
polyhedron, P:
S = P;
x0 is an extreme point of P that is a local maximum;
C = {t in R^n+: x0 - Vt is in P and f(x0 - Vt) >=
f(x0)-e}, where x* is the best value of f found so far and
e > 0 is an optimality tolerance.
Corner polyhedron problem. Gomory's relaxed IP that drops the
non-negativity constraints of
basic variables. Suppose x = (u,0) is a basic solution to the LP,
Opt{cx: Ax=b, x >= 0}, where u is the vector of basic variables.
Then, the corner polyhedron problem is
Opt{dv: u in Z^m, v in Z^(n-m)+, Bu + Nv =b}
(dropping u >= 0), where d = reduced cost of v in the LP solution.
(Note: cx=dv for all x=(u,v) such that Ax=b.)