Mathematical Programming Glossary - B
Mathematical Programming Glossary - B
Backbone. Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, where the backbone   is {j: xj=1 for all x in X*} (x is a vector of binary variables in the IP   formulation; X* is usually the optimality region, but it could allow some ball around the objective value.) Some results suggest that the difficulty of an NP-hard problem can be further analyzed by the size of its backbone.

Backtracking. A procedure to consider a node in a heuristic search   that has descendants not expanded (see branch and bound). A common procedure for a depth-first search is LIFO: go back on path from current node towards the root until a first node has been found with a descendant requiring expansion.

Backward substitution. The recursion to solve a nonsingular upper triangular system, Ux=b. It starts with x(n) = b(n)/U(n,n), then

Backward transformation (abbr. BTRAN). This applies to the factored system, u[E_1*E_2*...*E_n]=c, where each E_i is an elementary matrix. The recursion is:

u_n E_n = c
u_(n-1) E_(n-1) = u_n
...
u_1 E_1 = u_2

In the end, u = u_1 is the solution. Each elementary system is solved as follows. For notational convenience, suppose the system is uE = w, and v is the distinguished p-th column of E. Then,

u(i) = w(i) for i not= p;

u(p) = (w(p) - Sum_i{w(i)v(i): i not= p}) / v(p).

This is what is done in the (revised) simplex method to update the pricing vector.

Backward triangularization. An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row, as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing other column counts accordingly). This results in a lower triangular rearrangement if, and only if, such a rearrangement exists.

Balance equation. A constraint of the form I(x) - O(x) = 0, where I(x) is the input and O(x) is the output, each as a function of the decision variables, x. Typically this is linear, conserving flows, measured in units of volume or mass, through something. In an ordinary network, N=[V,A], where there are no gains or losses, this appears as:

        ___                ___ 
        \                  \
         |  x(j,i)    -    |  x(i,j)  =  0   at node i.
        /                  /
        ---                --- 
       j:(j,i) in A       j:(i,j) in A  
       =============      ==============
        I(x) = flow        O(x) = flow
        into node i        out of node i 
[graphic]

Barrier function. P is a barrier function on the strict interior, say I, of the set S if P is continuous, and P(x)-->infinity as x approaches any point in S\I. This arises in penalty function methods for certain classes that include convex programs of the form: Min{f(x): g(x) >= 0} (where I = {x: g(x) > 0}). Two popular choices are P(x) = Sum_i{1/g_i(x)}, and P(x) = -Sum_i{log(g_i(x))}.

Barycenter. From physics, this is the center of mass: the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position ({x^i}) of particles, weighted by their masses ({m_i}): Sum_i{m_i x^i}/Sum_i{m_i}. In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope: Sum_i{x^i}/m, where {x^i} = {x^1, ..., x^m} is the set of extreme points. More generally, this could be a weighted average:

Sum_i{w_i x^i}, where w is in S_m.

Basic. Associated with a submatrix of A, say B, whose columns comprise a basis for R^m (i.e., B consists of m linearly independent columns of A, which is a basis for R^m).

Benders' decomposition. This separates complicating variables. The (original) semi-linear model is:

Generalized Benders decomposition considers the following mathematical program:

Max{f(x, y): g(x, y) <= 0, h(x, y) = 0, x in X, y in Y}.

Suppose this is not a
convex program in total, but that with each fixed y in Y, the remaining mathematical program (with x the only variable) is convex. The idea is to consider

v(y) = Sup{f(x, y): g(x, y) < = 0, h(x, y) = 0, x in X},

and solve Max{v(y): y in Y}, with evaluations of v requiring the solution to the convex program.

Biconcave function. Negative is biconvex.

Biconvex function (F). F(x, y) on X*Y is convex on X for each y in Y and convex on Y for each x in X. An example is xy on R² (which is not convex in both x and y together).

Biconvex program. A nonlinear program with each function biconvex or biconcave, such that if x or y is fixed, the resulting mathematical program in y or x, resp., is a convex program. A special case is the bilinear program.

Bidding algorithm (or auction algorithm). This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem).

Bilevel program. A multilevel program with two levels (sets of variables).

Bilinear function. A special quadratic function, x'Qy + cx + by, that is linear in y for each x, and linear in x for each y.

Bilinear program. A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem.

Big-M method. An alternative to using Phase I, a large number (M) is used as a linear penalty in the composite objective:

Min cx + M ev: Ax + v = b, x, v >= 0,

where v is an artificial variable and e = (1, 1, ..., 1).

Bin packing problem. Partition a given set of integers, {c_1,...,c_n}, into bins such that the sum of all integers in each bin does not exceed the bin capacity, say b, so as to minimize the number of bins required.

Binary relation on sets S and T. A subset of S*T, say R. When T=S, we refer to just the relation on the set S.

    Here are 5 properties that distinguish relations on a set (x, y, z are in S):
  1. antisymmetric - (x, y) in R and x not= y imply (y, x) not in R.
  2. irreflexive - (x, x) not in R.
  3. reflexive - (x, x) in R.
  4. symmetric - (x, y) in R implies (y, x) in R.
  5. transitive - (x, y) and (y, z) in R imply (x, z) in R.

Binary variable. A decision variable with two feasible values, usually 0 or 1.

Binding constraint. A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.)

Bisection. Taking the midpoint of endpoints known to contain a solution. If the solution is in the interval (a,b), bisection chooses x = (a+b)/2 as the next point to evaluate. The result reduces the interval to (a,x) or (x,b), so the length is cut in half with each evaluation.

Bland's rule. This is for pivot selection in the simplex method to avoid cycling:

  • If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index.
  • If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index.
Beale's example of cycling   shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable x1 is chosen to enter the basis, rather than variable x5.

Blending problem. A blend is some combination of materials to make another material. Given raw materials, their blends make intermediate materials, called stock, and/or final materials, called products. (There can be other blending activities that make products from stocks.) The raw materials have purchase costs and the blending activities have costs of operation and maintenance. The products have either fixed demands or selling prices, and the problem is to find blends that minimize total net cost (or maximize profit). This arises in refinery operations, in the food industry, and in other process industries. The problem can sometimes be solved with linear programming, but there can be complicating relations that require nonlinear programming, such as the pooling problem.

Block pivot. Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the right-hand side):

               .-          -.
               | I 0 P R  e |
           A = |            |
               | 0 I Q S  f |
               |_          _| 
(Note: each I is an identity matrix of appropriate dimension.) If P is nonsingular, the basis can change by entering all (nonbasic) variables associated with the columns of P for all basic variables associated with the rows of P.

Blocking polyhedron (of a polyhedron, P). {y in R^n+: y'x >= 1 for all x in P}.

Bolzano-Weierstrass theorem. This is a fundamental existence theorem:

A continuous function on a non-empty compact set achieves a minimum and a maximum there.
A useful corollary to the Bolzano-Weierstrass theorem is:
Suppose f is continuous on its non-empty feasible region, F, and that F is a closed set. Then, f achieves a maximum on F if there exists x0 such that {x in F: f(x) >= f(x0)} is bounded.
In particular, if we have a quadratic objective of the form f(x) = x'Qx + cx, it is sufficient for Q  to be negative definite. (Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.) .

Bordered hessian. The complete hessian of the Lagrangian:

      .-                                             -.
      |  H_x[L(x,u,v)]        : grad_g(x)'    0       |
      |                       :    0       grad_h(x)' |
      | . . . . .  . . . . . .: . . . . . .. . . . . .|,
      | grad_g(x)     0       :    0          0       |
      |    0       grad_h(x)  :    0          0       |
      |_                                             _| 
[graphic] where H_x[L(x,u,v)] is the hessian of the Lagrangian (L) with respect to x only. The southwest and northeast blocks are the constraint gradients because that equals the cross derivative between x and the Lagrange multipliers (u,v).

Bottleneck assignment problem. This is the assignment problem, where the "cost", c(i, j), is the rate at which person i can perform job j. The objective is to maximize the rate of production of the entire assembly line.

Bottleneck transportation problem. This the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands:

minimize max{t_ij: x_ij > 0}: x >= 0, sum_j{x_ij} <= s_i, sum_i{x_ij} >= d_j,

where x_ij is the flow from source i to destination j, and t_ij is the time it takes to ship across arc (i, j) (independent of the quantity shipped).

Bottleneck TSP problem. Is there a TSP tour whose value is less than some given value?

Box constraint. Simple bounds of the form L <= x <= U.

Braess paradox. The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples for an example.

Branch and bound (B&B). Born in integer programming (IP), B&B is more generally used in global optimization, including continuous functions. A prototype is as follows.

  1. Initialize set list with feasible region (X plus all constraints), say S, with associated bounds -inf, inf (finite bounds on f can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize f* equal to the lower bound (-inf, unless an initial feasible solution is known).
  2. Branch by selecting the k-th set from list, say S_k, and splitting into a partition, say S_k = A\/B such that the interior of A does not intersect the interior of B. (Could partition S_k into more than 2 subsets.)
  3. Bound f in A and/or in B, say L(S) <= f(x) <= U(S) for all x in S, for S = A and/or B. Define U(S) = -inf if it is determined that S is empty (in which case S will be discarded in the next step).
  4. Test if U(S) <= f* + t, for S = A and/or B. If not, put S on list. (If the test passes, S is discarded by not being put on the list. Note: f* is best solution value to date, and t is a tolerance that says we want to get a solution whose objective value is within t of the optimal value.) Further, if L(S) > f* and the associated point, x(S), is feasible (where f(x(S)) = L(S)), replace f* with L(S) and x* with x(S). If the list is not empty, go branch.

B&B terminates when the list becomes empty, and the best feasible solution obtained is x* with value f*, which is within t of optimality. (If f* = -inf, there is no feasible solution.)

    Notes:
  1. A particular bounding rule for IP is to solve the LP relaxation, whose value is an upper bound on the integer-valued max. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change.
  2. Branching rules can range from breadth-first: expand all possible branches from a node in the search tree before going deeper into the tree, to depth-first: expand the deepest node first (and backtrack, as necessary). For example, suppose we branch on the value of x_j, which could be 0 or 1. The breadth-first rule would solve the LPs for each of the two cases. The depth-first rule would solve for x_j=0 and postpone the complementary case, x_j=1 to consider another variable dichotomy, going deeper into the search tree. To illustrate, the following are search progressions in {0,1}².

    Breadth-first:

             ____.                 ____.____          ____.____
            |                     |         |        |         |
          (x=0)                 (x=0)     (x=1)    (x=0)     (x=1)
                                                    /
                                                (y=0)
                 1                     2                  3
    
             ____.____             ____.____          ____.____
            |         |           |         |        |         |
          (x=0)     (x=1)       (x=0)     (x=1)    (x=0)     (x=1)
           /\                    /\         /       /\         /\
       (y=0)(y=1)            (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
                 4                     5                  6
     

    Depth-first:

              ____.                ____.              ____.
             |                    |                  |
           (x=0)                (x=0)              (x=0)
                                 /                  /\
                             (y=0)              (y=0)(y=1)
                 1                     2                  3
    
              ____.____            ____.____          ____.____
             |         |          |         |        |         |
           (x=0)     (x=1)      (x=0)     (x=1)    (x=0)     (x=1)
            /\                   /\         /       /\         /\
        (y=0)(y=1)           (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
                 4                     5                  6
     
  3. A best-first search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained.
  4. Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path.
  5. When the problem is not finite (not IP), branching rules need to consider the hyper-volume of each set in a partition to ensure convergence.
  6. The B&B methods we consider in OR are special cases of heuristic search in AI.

Branch and cut. A hybrid of  Branch and bound and cutting plane methods.

Branch and price. The "price" operation is the same as column generation.

Breakpoint. A point where some property of a function changes, such as its slope. For example, a piece-wise linear, univariate function with k breakpoints has the form:

where c(k+1) is defined to be infinity and c(1) is the least value of x. This function is continuous if it is equal at breakpoints, but the slope changes if a(i) not= a(i+1) for i=1, ..., k-1.

Broyden family. This is a family of algorithms to solve an unconstrained nonlinear program where the objective function is in C². The algorithms in this family differ by how they update the approximation of the hessian inverse (assumed to exist). These are of the form:

H = aH^(BFGS) + (1-a)H^(DFP),

and the two matrices are the Broyden-Fletcher-Goldfarb-Shanno update   (BFGS) and the Davidon-Fletcher-Powell update  (DFP), respectively. If a is in [0, 1], this is called the Broyden convex family.

Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. This is a method to solve an unconstrained nonlinear program that proceeds as follows.

    Start with any symmetric, negative definite matrix, say B (e.g., -I), and any point, say x. Compute g=grad_f(x), and set each of the following:
  1. direction: Solve Bd = -g.
  2. step size: s in argmax{f(x + td): t >= 0}.
  3. change in position: p = sd.
  4. new point and gradient: x' = x + p and g' = grad_f(x').
  5. change in gradient: q = g' - g.
  6. Replace x with x' and update B by the BFGS update to complete the iteration.

Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests BFGS has less problem with error.

Broyden-Fletcher-Goldfarb-Shanno (BFGS) update. This is a way to update an approximation of a hessian, used for unconstrained optimization.

                (q-Bp)(q-Bp)'
       B' = B + -------------,
                   p'(q-Bp) 
where B' is the update from B. The primes on the right denote transposes. Taking the inverse to compare with the DFP update,
                     1 + q'Hq pp'     pq'H + Hqp'
            H' = H + -------- ---  -  -----------,
                       q'p    p'q         q'p       
where H' is the update from H, and

p = change in position = x^k+1 - x^k (originally = a H grad_f, where a is a positive scalar determined by line search);
q = change in gradient = grad_f(x+p) - grad_f(x).

(Note: pp', pq'H and Hq'p' are matrices, and all denominators are inner products, presumed to be nonzero.)

Bundle method, bundle algorithm. A class of nonsmooth-optimization methods for the numerical minimization of a convex function, θ : Rn → R. At the kth iteration the technique uses the available information contained in a bundle (θ(v1), g1, ... , θ(vk), gk) that is obtained from an oracle; here each gi is a subgradient of θ at vi.

Important applications are Lagrangian relaxation and column generation applied to a primal problem

max { p(x) : x ∈ X, h(x) = 0 ∈ Rn }.

In this context, for a given v ∈ Rn,

θ(u) = max { p(x) - vT h(x) : x ∈ X}

results from the maximization of the Lagrange function; each gi is -h(xi), with xi maximizing the Lagrangian at vi. Minimizing θ is the dual problem.

When minimizing θ(v), a basic method to construct the next iterate vk+1 uses the cutting-plane paradigm: one minimizes the piecewise-linear function

θk(v) = max {θ(vi) + giT (v - vi): i = 1, ... ,k },

which can be viewed as a model of θ (in fact θk ≤ θ). Minimizing θk is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability -i.e. the sequence {vk} usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of θk by the quadratic program

min { θk(v) + (1/(2t))|v - C|2 }.

Here t > 0 is a stability parameter that is usually adjusted at each iteration, and the stability center C is some vi (i ≤ k) taken from the bundle, typically one that gave the best θ-value.


Notation


Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2014, Mathematical Programming Glossary, by the INFORMS Computing Society