Note: This version of the glossary is obsolete and is not maintained. Please use http://glossary.computing.society.informs.org Mathematical Programming Glossary - B Mathematical Programming Glossary - B Backbone. Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, where the backbone   is {j: xj=1 for all x in X*} (x is a vector of binary variables in the IP   formulation; X* is usually the optimality region, but it could allow some ball around the objective value.) Some results suggest that the difficulty of an NP-hard problem can be further analyzed by the size of its backbone. Backtracking. A procedure to consider a node in a heuristic search   that has descendants not expanded (see branch and bound). A common procedure for a depth-first search is LIFO: go back on path from current node towards the root until a first node has been found with a descendant requiring expansion. x(j) = [b(j) - sum_i{U(i, j)x(i): i > j}] / U(j, j) for j=n-1,...,1. u_n E_n = c u_(n-1) E_(n-1) = u_n ... u_1 E_1 = u_2 In the end, u = u_1 is the solution. Each elementary system is solved as follows. For notational convenience, suppose the system is uE = w, and v is the distinguished p-th column of E. Then, u(i) = w(i) for i not= p; u(p) = (w(p) - Sum_i{w(i)v(i): i not= p}) / v(p). This is what is done in the (revised) simplex method to update the pricing vector. Backward triangularization. An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row, as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing other column counts accordingly). This results in a lower triangular rearrangement if, and only if, such a rearrangement exists. ``` ___ ___ \ \ | x(j,i) - | x(i,j) = 0 at node i. / / --- --- j:(j,i) in A j:(i,j) in A ============= ============== I(x) = flow O(x) = flow into node i out of node i ``` [graphic] Barrier function. P is a barrier function on the strict interior, say I, of the set S if P is continuous, and P(x)-->infinity as x approaches any point in S\I. This arises in penalty function methods for certain classes that include convex programs of the form: Min{f(x): g(x) >= 0} (where I = {x: g(x) > 0}). Two popular choices are P(x) = Sum_i{1/g_i(x)}, and P(x) = -Sum_i{log(g_i(x))}. Barycenter. From physics, this is the center of mass: the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position ({x^i}) of particles, weighted by their masses ({m_i}): Sum_i{m_i x^i}/Sum_i{m_i}. In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope: Sum_i{x^i}/m, where {x^i} = {x^1, ..., x^m} is the set of extreme points. More generally, this could be a weighted average: Sum_i{w_i x^i}, where w is in S_m. Here are some related terms that arise in linear programming: Adjacent basis. One that differs in exactly one column from a given basis. Basic column. A column of the basis matrix. Basic variable. The variable, say x_j, associated with the j-th column of the basis matrix. Basic level. The value of a basic variable. Basic solution. The solution (x) obtained by setting nonbasic values to some bound value (like 0), resulting in a unique solution for the basic variables. That is, Ax=b is equivalent to Bu + Nv = b, where A=[B N] and x=[u v]. Upon fixing the value of v, the nonsingularity of B gives the basic solution with u = B–1[b - Nv]. (In a standard LP, v=0, so u=B–1b.) Basic feasible solution. A basic solution that is feasible -- i.e., the basic values satisfy their bounds. (In a standard LP, this means B–1b >= 0.) Basis kernel. After performing forward triangularization, if the basis does not triangularize completely, backward triangularization is applied. The result is a (rearranged) blocking of the basis into three segments: ``` |\ | \ <--- Forward triangle |__\ ______ | | | | | | <--- Kernel | |______| | |\ | | \ <--- Backward triangle |__________|__\ ``` Each row and column in the kernel has at least 2 nonzeroes. Max{cx + F(y): x in X, y in Y, Ax + G(y) <= b}. Generalized Benders decomposition considers the following mathematical program: Max{f(x, y): g(x, y) <= 0, h(x, y) = 0, x in X, y in Y}. Suppose this is not a convex program in total, but that with each fixed y in Y, the remaining mathematical program (with x the only variable) is convex. The idea is to consider v(y) = Sup{f(x, y): g(x, y) < = 0, h(x, y) = 0, x in X}, and solve Max{v(y): y in Y}, with evaluations of v requiring the solution to the convex program. Biconvex function (F). F(x, y) on X*Y is convex on X for each y in Y and convex on Y for each x in X. An example is xy on R² (which is not convex in both x and y together). Biconvex program. A nonlinear program with each function biconvex or biconcave, such that if x or y is fixed, the resulting mathematical program in y or x, resp., is a convex program. A special case is the bilinear program. Bidding algorithm (or auction algorithm). This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem). Bilevel program. A multilevel program with two levels (sets of variables). Bilinear program. A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem. Big-M method. An alternative to using Phase I, a large number (M) is used as a linear penalty in the composite objective: Min cx + M ev: Ax + v = b, x, v >= 0, where v is an artificial variable and e = (1, 1, ..., 1). Here are 5 properties that distinguish relations on a set (x, y, z are in S): antisymmetric - (x, y) in R and x not= y imply (y, x) not in R. irreflexive - (x, x) not in R. reflexive - (x, x) in R. symmetric - (x, y) in R implies (y, x) in R. transitive - (x, y) and (y, z) in R imply (x, z) in R. Binary variable. A decision variable with two feasible values, usually 0 or 1. Binding constraint. A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.) If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index. If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index. Beale's example of cycling   shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable x1 is chosen to enter the basis, rather than variable x5. Block pivot. Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the right-hand side): ``` .- -. | I 0 P R e | A = | | | 0 I Q S f | |_ _| ``` (Note: each I is an identity matrix of appropriate dimension.) If P is nonsingular, the basis can change by entering all (nonbasic) variables associated with the columns of P for all basic variables associated with the rows of P. A continuous function on a non-empty compact set achieves a minimum and a maximum there. A useful corollary to the Bolzano-Weierstrass theorem is: Suppose f is continuous on its non-empty feasible region, F, and that F is a closed set. Then, f achieves a maximum on F if there exists x0 such that {x in F: f(x) >= f(x0)} is bounded. In particular, if we have a quadratic objective of the form f(x) = x'Qx + cx, it is sufficient for Q  to be negative definite. (Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.) . ``` .- -. | H_x[L(x,u,v)] : grad_g(x)' 0 | | : 0 grad_h(x)' | | . . . . . . . . . . .: . . . . . .. . . . . .|, | grad_g(x) 0 : 0 0 | | 0 grad_h(x) : 0 0 | |_ _| ``` [graphic] where H_x[L(x,u,v)] is the hessian of the Lagrangian (L) with respect to x only. The southwest and northeast blocks are the constraint gradients because that equals the cross derivative between x and the Lagrange multipliers (u,v). Bottleneck assignment problem. This is the assignment problem, where the "cost", c(i, j), is the rate at which person i can perform job j. The objective is to maximize the rate of production of the entire assembly line. Bottleneck transportation problem. This the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands: minimize max{t_ij: x_ij > 0}: x >= 0, sum_j{x_ij} <= s_i, sum_i{x_ij} >= d_j, where x_ij is the flow from source i to destination j, and t_ij is the time it takes to ship across arc (i, j) (independent of the quantity shipped). Bottleneck TSP problem. Is there a TSP tour whose value is less than some given value? Braess paradox. The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples for an example. Branch and bound (B&B). Born in integer programming (IP), B&B is more generally used in global optimization, including continuous functions. A prototype is as follows. Initialize set list with feasible region (X plus all constraints), say S, with associated bounds -inf, inf (finite bounds on f can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize f* equal to the lower bound (-inf, unless an initial feasible solution is known). Branch by selecting the k-th set from list, say S_k, and splitting into a partition, say S_k = A\/B such that the interior of A does not intersect the interior of B. (Could partition S_k into more than 2 subsets.) Bound f in A and/or in B, say L(S) <= f(x) <= U(S) for all x in S, for S = A and/or B. Define U(S) = -inf if it is determined that S is empty (in which case S will be discarded in the next step). Test if U(S) <= f* + t, for S = A and/or B. If not, put S on list. (If the test passes, S is discarded by not being put on the list. Note: f* is best solution value to date, and t is a tolerance that says we want to get a solution whose objective value is within t of the optimal value.) Further, if L(S) > f* and the associated point, x(S), is feasible (where f(x(S)) = L(S)), replace f* with L(S) and x* with x(S). If the list is not empty, go branch. B&B terminates when the list becomes empty, and the best feasible solution obtained is x* with value f*, which is within t of optimality. (If f* = -inf, there is no feasible solution.) Notes: A particular bounding rule for IP is to solve the LP relaxation, whose value is an upper bound on the integer-valued max. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change. Branching rules can range from breadth-first: expand all possible branches from a node in the search tree before going deeper into the tree, to depth-first: expand the deepest node first (and backtrack, as necessary). For example, suppose we branch on the value of x_j, which could be 0 or 1. The breadth-first rule would solve the LPs for each of the two cases. The depth-first rule would solve for x_j=0 and postpone the complementary case, x_j=1 to consider another variable dichotomy, going deeper into the search tree. To illustrate, the following are search progressions in {0,1}². Breadth-first: ``` ____. ____.____ ____.____ | | | | | (x=0) (x=0) (x=1) (x=0) (x=1) / (y=0) 1 2 3 ____.____ ____.____ ____.____ | | | | | | (x=0) (x=1) (x=0) (x=1) (x=0) (x=1) /\ /\ / /\ /\ (y=0)(y=1) (y=0)(y=1) (y=0) (y=0)(y=1) (y=0)(y=1) 4 5 6 ``` Depth-first: ``` ____. ____. ____. | | | (x=0) (x=0) (x=0) / /\ (y=0) (y=0)(y=1) 1 2 3 ____.____ ____.____ ____.____ | | | | | | (x=0) (x=1) (x=0) (x=1) (x=0) (x=1) /\ /\ / /\ /\ (y=0)(y=1) (y=0)(y=1) (y=0) (y=0)(y=1) (y=0)(y=1) 4 5 6 ``` A best-first search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained. Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path. When the problem is not finite (not IP), branching rules need to consider the hyper-volume of each set in a partition to ensure convergence. The B&B methods we consider in OR are special cases of heuristic search in AI. f(x) = a(i)x + b(i) for c(i) <= x < c(i+1) for i=1, ...,k where c(k+1) is defined to be infinity and c(1) is the least value of x. This function is continuous if it is equal at breakpoints, but the slope changes if a(i) not= a(i+1) for i=1, ..., k-1. H = aH^(BFGS) + (1-a)H^(DFP), and the two matrices are the Broyden-Fletcher-Goldfarb-Shanno update   (BFGS) and the Davidon-Fletcher-Powell update  (DFP), respectively. If a is in [0, 1], this is called the Broyden convex family. Start with any symmetric, negative definite matrix, say B (e.g., -I), and any point, say x. Compute g=grad_f(x), and set each of the following: direction: Solve Bd = -g. step size: s in argmax{f(x + td): t >= 0}. change in position: p = sd. new point and gradient: x' = x + p and g' = grad_f(x'). change in gradient: q = g' - g. Replace x with x' and update B by the BFGS update to complete the iteration. Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests BFGS has less problem with error. ``` (q-Bp)(q-Bp)' B' = B + -------------, p'(q-Bp) ``` where B' is the update from B. The primes on the right denote transposes. Taking the inverse to compare with the DFP update, ``` 1 + q'Hq pp' pq'H + Hqp' H' = H + -------- --- - -----------, q'p p'q q'p ``` where H' is the update from H, and p = change in position = x^k+1 - x^k (originally = a H grad_f, where a is a positive scalar determined by line search); q = change in gradient = grad_f(x+p) - grad_f(x). (Note: pp', pq'H and Hq'p' are matrices, and all denominators are inner products, presumed to be nonzero.) Bundle method, bundle algorithm. A class of nonsmooth-optimization methods for the numerical minimization of a convex function, θ : Rn → R. At the kth iteration the technique uses the available information contained in a bundle (θ(v1), g1, ... , θ(vk), gk) that is obtained from an oracle; here each gi is a subgradient of θ at vi. Important applications are Lagrangian relaxation and column generation applied to a primal problem max { p(x) : x ∈ X, h(x) = 0 ∈ Rn }. In this context, for a given v ∈ Rn, θ(u) = max { p(x) - vT h(x) : x ∈ X} results from the maximization of the Lagrange function; each gi is -h(xi), with xi maximizing the Lagrangian at vi. Minimizing θ is the dual problem. When minimizing θ(v), a basic method to construct the next iterate vk+1 uses the cutting-plane paradigm: one minimizes the piecewise-linear function θk(v) = max {θ(vi) + giT (v - vi): i = 1, ... ,k }, which can be viewed as a model of θ (in fact θk ≤ θ). Minimizing θk is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability -i.e. the sequence {vk} usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of θk by the quadratic program min { θk(v) + (1/(2t))|v - C|2 }. Here t > 0 is a stability parameter that is usually adjusted at each iteration, and the stability center C is some vi (i ≤ k) taken from the bundle, typically one that gave the best θ-value. Notation Send questions and comments to icsMPGlossary@mail.informs.org. View the INFORMS Computing Society's Editorial Board Copyright© 1996 – 2014, Mathematical Programming Glossary, by the INFORMS Computing Society