Mathematical Programming Glossary - D
Mathematical Programming Glossary - D

Damped Newton Method. See Modified Newton method.

Dantzig-Wolfe decomposition. The principle applied here is that the total problem consists of:

  1. subprograms corresponding to its almost independent parts;
  2. master program that ties together the subprograms.

In
linear programming   this applies to the following block diagonal structure, linked by master constraints.
           .---------------------.
           | Master constraints  |
           |_____________________|
           .----.
           | B1 |
           |____|
                 .----.
                 | B2 |
                 |____|
                       .
                         .
                           .
                            .----.
                            | Bk |
                            |____| 
The procedure operates on the master LP with blocks used to generate extreme points as part of the pricing.

Davidon-Fletcher-Powell (DFP) method. This is a method to solve an unconstrained nonlinear program that proceeds as follows.

    Start with any symmetric, negative definite matrix, say H (e.g., -I), and any point, say x. Compute g=grad_f(x), and set each of the following:
  1. direction: d = -Hg.
  2. step size: s in argmax{f(x + td): t >= 0}.
  3. change in position: p = sd.
  4. new point and gradient: x' = x + p and g' = grad_f(x').
  5. change in gradient: q = g' - g.
  6. Replace x with x' and update H by the DFP update to complete the iteration.

Davidon-Fletcher-Powell (DFP) update. This is a way to update an approximation of an inverse hessian, used for unconstrained optimization. Using the notation in the DFP method, the update is:

                           pp'     Hqq'H
                H' = H  +  ---  -  -----.
                           p'q     q'Hq         
Note: pp' is a rank 1 matrix (the outer product of the vector p with itself).

Dead-end elimination. Same as fathoming. Used by scientists who have combinatorial optimization problems, this is more descriptive than the older term of "fathoming" a node. The idea is that a condition has been met that ensures that following some branch (i.e., setting values of variables) cannot lead to an optimal solution. This is often just a dominance property.

Decomposition principle. The idea of decomposing a mathematical program into two or more sets of variables and associated constraints. The purpose is to separate some portion with special structure from the rest of the mathematical program.

Here are some particular ones that are in this glossary:

Decoupling principle. Arises in sensitivity analysis, as a corollary to the Compatibility Theorem:

An equilibrium basis is compatible with the change (db, dc) if, and only if, it is compatible with both (db, 0) and (0, dc).

Degeneracy. The solution (x, y, s) to the primal-dual pair of linear programs:

Min{cx: x >= 0, Ax = b} and Max{yb: s >= 0, yA + s = c}
is degenerate if it is not strictly complementary ---i.e. x_i = s_i = 0 for some i. The pair (x_i, s_i) is primal degenerate if there is an optimal solution x' such that x'_i > 0. Similarly, the pair is dual degenerate if there is a dual optimal solution (y', s') such that s'_i > 0.

With regards to a basic optimal solution, such a solution is (primal) degenerate when some basic variable is at one of its bound values (canonically zero). A basic optimal is dual degenerate if one of its nonbasic variables has a zero reduced cost. Geometrically, this corresponds to a degenerate polyhedron. Suppose we have {x: Ax <= b} (where A is m by n). This polyhedron is degenerate if there exists an extreme point that is an element of the intersection of more than n hyperplanes. The pyramid is degenerate because four planes meet at a point. (Example.)

Algorithmically, degeneracy can cause cycling in the simplex method.

Degeneracy graph. An undirected graph, where the nodes represent bases and edges their adjacency. The degeneracy graph is a way to probe deeply into a degenerate extreme point (represented by more than one basic feasible solution). Among other things, it reveals good pivot steps through the point enroute to an optimum, and it provides a clear, efficient approach to sensitivity analysis.

    Here are some terms for a degeneracy graph associated with a (basic) solution, x:
  • degeneracy power of x: number of bases corresponding to x.
  • internal node: a node whose neighbors are all in this degeneracy graph.
  • transition column: a tableau column such that a pivot exists for which that column enters the basis, and the new basis corresponds to a transition node.
  • transition node: a node with at least one member outside this degeneracy graph (so a pivot moves to a different solution).
  • Transition Node Pivoting(TNP) rule: a lexicographic rule using a special column to avoid internal nodes (but cycling is possible).

Degenerate Polyhedron. See Degeneracy

Degree-2 inequality. Arises in mixed integer linear programming models for combinatorial programs on graphs, where each row of the A matrix has 2 non-zeroes (usually 1's), as in the node packing problem.

Degree of difficulty. See geometric program.

Density of a matrix. The portion of nonzeroes (= 1 – sparsity).

Detached coefficient form. Given the system Ax=b, we augment the objective equation: cx-z=0. Then, detaching the variables, the form is:

               .-        -.
               | A   0  b |
               | c  -1  0 |
               |_        _|   
Upon multiplying Ax=b by the inverse of a basis, say B, where A=[B N], the detached coefficient form becomes
             .-                     -.
             | I    B–1N     0  B–1b |
             |                       |
             | c_B  c_N     -1   0   |
             |_                     _|   
where c=[c_B c_N] (conformal with the partition of the columns of A). Now if we subtract c_B times the first m rows from the last row, this becomes
          .-                                 -.
          | I  B–1N            0      B–1b    |
          |                                   |
          | 0  c_N - c_BB–1N  -1     -c_BB–1b |
          |_                                 _|   
Recognizing that B–1b is the vector of
basic levels, and the associated objective value (z) is c_BB–1b, we can drop the first m columns plus column n+1 (corresponding to z) and form the tableau associated with this basis (adding labels to identify the basic variables in the rows and nonbasic variables in the columns).

Dichotomous search. This finds the maximum of a unimodal function on an interval, (a, b), by evaluating points placed near the center, approximating the bisection method. With e a small positive value, let x = (a+b)/2 - e and y = (a+b)/2 + e. Then, if f(x) < f(y), the new interval of uncertainty is (x, b); if f(x) > f(y), it is (a, y); if f(x) = f(y), it is (x, y). With N=2M function evaluations, the interval of uncertainty can be reduced to within (b-a)r, where

r = 2–M + [1 – 2–M]e.

(The same reduction occurs with N=2M+1, as an even number of evaluations is required.) For example, with N=20, 2–10=.0009765, so the interval reduction is .0009765 + .999756e. See fibonacci search   for a more efficient method.

Diet problem. Select levels of food consumption so as to minimize total cost subject to nutrient requirements. As a linear program, Min{cx: Ax > = b, x > = 0}, x_j = level of food j at unit cost c_j. The i-th nutrient requirement (e.g., amount of vitamin C) is b_i, and A(i, j) = amount of nutrient i in each unit of food j.

Digraph. Directed graph.

Dijkstra's algorithm. See labeling algorithm.

Dimension. 1. Of a set: the dimension of the smallest affine space that contains the set. The dimension of an affine space is k if the maximum number of affinely independent points in the set is k+1. Of particular interest is the column space of a matrix A: {x: x=Ay for some y in R^n}. The dimension of this is the rank of A. Another is the row space of A: {x: x=uA for some u in R^m}. The dimension of this is also the rank of A.

2. Of an expression: the units of measurement (e.g., tons of apples, meters of rope, hours of labor). In a relation, such as y = ax + b, the units of y, ax and b must all be the same. Dimensional analysis is concerned with determining such consistency and inferring whatever units are missing.

Diophantine equations. A system of equations whose variables must be integer-valued. A famous one is in Fermat's Last Theorem:

In general, solving the linear Diophantine equations, Ax=b for x in Zn can be solved in polynomial time. However, with bounds, 0 ≤x ≤ u, the problem is NP-complete.

Directed tree search. A class of algorithms designed to systematically search a decision space, where nodes correspond to partial solutions (sometimes called ``states''). Examples are branch and bound and implicit enumeration. Also see heuristic search.

Directional derivative. The limit (when it exists) of the rate of change along a specified direction, say h:

              f(x+th) - f(x)
        lim   --------------.
        t-->0+       t     
In particular, the i-th partial derivative is with h=e_i. (Note that the directional derivative, as defined above, depends on the scale: the derivative for kh is k times the derivative for h. To avoid scale dependence, some authors require ||h|| = 1.) Recently, some people have called this a B-derivative (or Bouligand derivative), and functions that have directional derivatives in all feasible directions   are B-differentiable. Some require the convergence to be uniform.

Discount rate (or discount factor). This accounts for the time value of money and arises naturally in financial models, such as a portfolio selection problem. A discount rate of 7% means $1 earned a year from now has a present value of approximately $93.46 (1/1.07). If $1 is earned n years from now, and the discount rate is r, the present value is $1/(1+r)^n. In continuous-time models, there are variations, such as defining the present value of K dollars at time t to be K(1-e-rt). In infinite horizon dynamic programming, the discount factor serves to make value iteration a contraction map. In that case, the fixed point of the stationary equation,

F(s) = Opt{r(x, s) + a F(T(s, x)): x in X(s)},

is obtained by successive approximation - i.e., value iteration for an infinite horizon DP. This converges to a unique fixed point (F) if 0 < a < 1. Here, the DP notation is used, where a is the discount factor.

Discrete Mathematical Program. A mathematical program in which the variables are restricted to a discrete set, commonly the integers. Subcases include integer programs and combinatorial programs.

Disjunctive Normal Form (DNF). A logical expression that is written as a disjunction, D1 \/ D2 \/ ... \/ Dm, where \/ is the disjunction (logical inclusive or). Each Di has the form Li1 /\ Li2 /\ ... /\ Lin, where /\ is the conjunction (logical and) and each Li is a literal – a proposition or its negation.

Disjunctive program. In its most general form, this seeks a solution to one of a finite list of mathematical programs:

x* in Argopt{fk(x): x in Xk} for some k=1,...,K.

Usually, they have the same objective functions, fk=f for all k, so the mathematical program with disjunctive constraints has the following form:

Opt f(x): x in X, (g1(x) <= 0 or g2(x) <= 0 ... or gK(x) <= 0 ).

This can be reformulated as a
mixed-integer program, using binary variables to represent the disjunction.

Divide and conquer. An algorithm strategy, such as dynamic programming, that divides a problem into parts. Suppose T(n) is the time complexity for a problem of size n, and that dividing the problem into s subproblems, each of size b, takes S(n) additional time. Then, T(n) = sT(n/b) + S(n) is called the divide-and-conquer recurrence relation. If S(n)=c > 0 (a constant) and b > 1, T(n) = O(n^[log_b(s)]) for n=b,2b,3b, ..., and T(n) = O(log(n)) if s=1. For example, if a bisection method is used (as in searching a sorted list), the divide and conquer recurrence is T(n) = T(n/2) + 2 for n=2,4,6,8, ... This implies T(n) = O(log(n)) since s=1 and b=2.

Dominance. This is used in many contexts, but the general meaning is that something is uniformly better than something else. For example, consider two activities in a linear program, say j and k, such that:

  • j has greater cost: c_j >= c_k
  • j produces less of each requirement: A(i, j) <= A(i, k) for i such that we require A(i,.)x >= b_i
  • j consumes more of each resource: A(i, j) >= A(i, k) for i such that we require A(i,.)x <= b_i
  • j produces or consumes at the same rate of goods to be conserved: A(i, j) = A(i, k) for i such that we require A(i,.)x = b_i
Then, activity k dominates  activity j.

Doubly stochastic matrix. A non-negative matrix such that each row and each column sums to 1.

Dual. Another mathematical program with the property that its objective is always a bound on the original mathematical program, called the primal. Suppose the dual is Min{F(y): y in Y}. Then, F(y) >= f(x) for all feasible x in the primal and all y in Y. This immediately implies that if the primal is feasible, the dual cannot be unbounded, and vice versa: if the dual is feasible, the primal cannot be unbounded. It also implies that if the dual is unbounded, the primal must be infeasible (and vice versa). A dual provides a sufficiency test for optimality, for if feasible x and y can be found such that f(x)=F(y), it follows that x is optimal in the primal and y is optimal in the dual. Weak duality is when this is all that can be guaranteed. Strong duality Strong duality is when a finite optimal value to one problem ensures the existence of an optimal solution to the other and that their optimal objective values are equal. There are particular duals of significance.

Dual Degeneracy. See Degeneracy

Dual method. An algorithm for which each iterate is feasible in a dual mathematical program. An example is that the class of cutting plane methods, associated with the Lagrangian dual, is dual to the column generation method of Dantzig-Wolfe decomposition.

Dual norm. Given the norm is L_p, where p >= 1, its dual norm is L_q, where (1/p) + (1/q) = 1. (If p=1, q=infinity; and, if p=infinity, q=1.) This is based on Hölder's inequality.

Dual price. A dual variable associated with a primal constraint.

Duality. The study and use of dual mathematical programs.

Duality gap. Numerically, it is the difference between primal and dual objective values. The phrase by itself means the presence (or existence) of a gap -- i.e., that the value is not zero.

Duality theorems. These establish the fundamental duality relations (weak or strong). The most significant ones are given here.

Dynamic program (DP). Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the DP is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process:

F(t, s) = Opt{r(t, s, x) + aF(t', s'): x in X(t, s) and s'=T(t, s, x)},

where F(t, s) = optimal total return upon reaching time t in state s, and x = decision variable(s); s, s' = state variables; t, t' = time; X(t, s) = decision space (usually depends only on state); r(t, s, x) = immediate return; T(t, s, x) = state transform.

In words, the optimal return upon reaching time t in state s equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state s'. This new state is a function of the current time, state, and decision. For discrete time, t'=t+1 for the forward equation, and t'=t-1 for the backward equation. The multiplier (a) is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems a=1, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.

A decision rule is a function, x*(t, s), that specifies an optimal value of x upon reaching time t in state s. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case r and F are expected values, and the state transform is random with probability distribution, P[T(t,s,x) = s' | s,x]. Thus, F(t',s') is replaced by Sum_s'{F(t',s')P[T(t,s,x)=s'|s,x]}.

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:

F(n, s) = Max{c(n)x + F(n-1, s-a(n)x): x in {0, 1, ..., |_s/a(n)_|}}.

In words, this says that the max total return with n objects having total knapsack space s equals the max choice (x) of how much of the n-th object we put into the knapsack, with immediate return c(n), plus the max total return from the remaining n-1 objects with the remaining space (s - a(n)x). The value of x (the stage decision variable) is any non-negative integer up to the space allowed: |_s/a(n)_| is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on x, such as the 0-1 knapsack problem, in which case the policy set, X(n,s) simply has only those values.

 


Notation

Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2008, Mathematical Programming Glossary, by the INFORMS Computing Society