Mathematical Programming Glossary - G
Mathematical Programming Glossary - G

Game theory. In general, a (mathematical) game can be played by one player, such as a puzzle, but its main connection with mathematical programming is when there are at least two players, and they are in conflict. Each player chooses a strategy that maximizes his payoff. When there are exactly two players and one player's loss is the other's gain, the game is called zero sum. In this case, a payoff matrix, A, is given where Aij is the payoff to player 1, and the loss to player 2, when player 1 uses strategy i and player 2 uses strategy j. In this representation each row of A corresponds to a strategy of player 1, and each column corresponds to a strategy of player 2. If A is m × n, this means player 1 has m strategies, and player 2 has n strategies. Here is an example of a 2 × 3 payoff matrix:

A = [10 -5 20 \\ -15 5 -15].

A primary connection with mathematical programming is duality. Suppose x=(x1 ... xm) and y=(y1 ... yn) are strategy probability vectors chosen by the two players –  i.e., in a particular play, xi = probability player 1 chooses strategy i and yj = probability player 2 chooses strategy j. Then, the value   (v) of the game is the expected payoff:

v = xAy = sum i,j xiAijyj ,

where player 1 seeks to maximize v and player 2 seeks to minimize v. A pure strategy is a unit vector. For example, if x = (1, 0), player 1 is using his first strategy 100% of the time; if y = (0, 1, 0), player 2 is using his second strategy 100% of the time. The expected payoff is simply the cell entry, A12, which is –5 (player 2 pays 5 units to player 1). The strategies that are not pure are called mixed. For example, x = (½, ½) means player 1 plays each of the two strategies 50% of the time (flipping a coin on each play); y = (¼, ¾, 0) means player 2 plays his first strategy 25% of the time and his second the remaining 75%. The expected payoff is –5/8.

The fundamental duality theorem of linear programming provides the equivalent primal-dual formulation:

Player 1 is Player 2 is
Primal Dual

  max v:   x >= 0, sum i xi = 1,   min v:   y >= 0, sum j yj = 1,
v <= sumi xiAij   for all j v >= sumj Aijyj   for all i

This is the simplest 2-person game; more generally, there can be n persons, and subsets of the players can form cooperative agreements. If this is not allowed, the game is called non-cooperative.

A combinatorial game is a cooperative game with restrictions on the formation of coalitions that induce a combinatorial structure, such as a matroid.

GAMS. Generalized Algebraic Modeling System.

Gauge function. A nonnegative, convex function, from R^n to the extended reals, that is positively homogeneous and f(0)=0. A norm is a gauge function. Here is an example that is not a norm:

                    _
                   | ||(x-z), (x-y)||_2  if 2x - y - z = 0;
      f(x, y, z) = |
                   |_  infinity          otherwise. 
Here is a graphic.

Gaussian elimination. A method to solve Ax=b that performs elementary row operations on A to annihilate successive elements of A in order to reduce A to an upper triangular matrix, U. On paper, the same operations are applied to b, then the solution is obtained by solving the resulting upper triangular system. In a computer, the product of the matrices effecting the elementary row operations is a lower triangular matrix, L, with unit diagonal. Once this phase is completed, the system Ax=b becomes LUx=b. This is then solved in two steps: forward substitution solves Ly=b; then backward substitution solves Ux=y. (Of course, computer implementations vary.)

Gauss-Jordan elimination. A method to solve Ax=b that performs elementary row and column operations on A to annihilate successive elements of A in order to reduce A to an identity matrix. On paper, the same operations are applied to b, then the solution is obtained by solving the resulting identity system. In a computer, the matrices effecting the elementary operations are saved as elementary matrices, say {E_i}. Then, the system is equivalent to [E_1*E_2*...*E_n]x=b, and forward transformation is applied to solve for x. This is what is done in the (revised) simplex method, and each iteration is a pivot operation.

Gauss-Seidel method. An iterative method to solve Ax=b using the latest values of the components of x during the updates. Specifically, A is split into L-T, where L is lower triangular, and the update solves Lx' = Tx + b. (Surprisingly, history indicates Gauss did not know about this method, and Seidel did not recommend it.) Often people refer to the method to mean the use of latest values, rather than the specific splitting. It arises, for example, in parallel computation, where using the latest value of x during the computation of x' is not done since it would not be able to take advantage of the parallelism.

Generalized equation. This has the form:  0 is in {f(x)} + T(x), where f is a function from R^n into R^n and T is a point to set map that maps x into subsets of R^n. If T is absent, the condition reduces to an ordinary equation, f(x)=0. Usually, T is assumed to satisfy the monotonicity condition:

(x-x')(y-y') >= 0 for y in T(x) and y' in T(x').

This includes a variational inequality (and hence the complementarity problem) as a special case.

Generalized inverse. Suppose A is any m by n matrix. A+ is a generalized inverse of A if A+ is n by m and AA+A = A. Then, a fundamental theorem for linear equations is:

The equation, Ax = b, has a solution if, and only if, AA+b = b for any generalized inverse, A+, in which case the solutions are of the form:

x = A+b + (I - A+A)y   for any y in R+n.

Example.

The Moore-Penrose class additionally requires that AA+ and A+A be symmetric (hermitian, if A is complex). In particular, if rank(A) = m, A+ = A'Inv(AA') is a Moore-Penrose inverse. Note that AA+ = I, and A+A is a projection matrix for the subspace {x: x = A'y for some y in R+m}, since x = A'y implies [A+A]x = A'Inv(AA')AA'y = A'y = x. Further, I-A+A projects into its orthogonal complement, which is the null space of A: A[(I-A+A)x] = (A-A)x = 0.

Generalized Lagrange Multiplier method (GLM). This solves a sequence of Lagrangian optimization (relaxation) problems, searching the multiplier space by some method to seek a minimum of the maximum Lagrangian function.

  1. Given (u, v) with u >= 0, find x in argmax{f(x) - ug(x) - vh(x): x in X}.
  2. Test if 0 is in the convex hull of S(x, u, v) := {(b, c): g(x) <= b, h(x)=c, ug(x)=ub}. If so, stop (x solves the original mathematical program if 0 is in S(x, u, v); otherwise, you need to search alternative optima; if no max generates 0 as a member of S, the right-hand side (0) is in a duality gap).
  3. Given (x, u, v), choose (u', v') to reduce the value of L*(u, v).

This can be viewed from several vantages:

Generalized network. A network in which the flow that reaches the destination could be different from the flow that left the source. In the incidence matrix, instead of

           ____
          | -1 | ...source row (i)
          |    |
          |    |
          | +1 | ...destination row (j)
          |____| 
we have
           ____
          | -1 | ...source row (i)
          |    |
          |    |
          |  g | ...destination row (j)
          |____| 
where g > 0. If g < 1, there is a loss; if g > 1, there is a gain.

Generalized reduced gradient method (GRG). This is a generalization of the reduced gradient method by allowing nonlinear constraints and arbitrary bounds on the variables. The form is:

Max f(x): h(x)=0, L <= x <= U,

where h has dimension m. The method supposes we can partition x = (v, w) such that:
  • v has dimension m (and w has dimension n-m);
  • the values of v are strictly within their bounds: L_v < v < U_v (this is a nondegeneracy assumption);
  • grad_v[h(x)] is nonsingular at x = (v,w).

As in the linear case, for any w there is a unique value, v(w), such that h(v(w),w)=0 (c.f., Implicit Function Theorem), which implies that dv/dw = grad_vh(x)–1 grad_w[h(x)]. The idea is to choose the direction of the independent variables to be the reduced gradient: grad_w[f(x) - yh(x)], where y = dv/dw = (grad_vh(x)–1)grad_w[h(x)]. Then, the step size is chosen and a correction procedure applied to return to the surface, h(x)=0.

The main steps (except the correction procedure) are the same as the reduced gradient method, changing the working set as appropriate.

Generalized upper bound (GUB). An upper bound constraint on a sum of variables: Sum_j{x_j: j in J} <= U. A collection of these that do not overlap (i.e., index sets J are disjoint) comprise the foundation for the generalized upper bounding technique in linear programming. This is where the summation equation is marked in the data structure, rather than represented explicitly as a row in the LP, and the basis is handled in its reduced dimension to gain computational efficiency in both time and space.

Genetic algorithm (GA). A class of algorithms inspired by the mechanisms of genetics, which has been applied to global optimization (especially for combinatorial programs). It requires the specification of three operations (each is typically probabilistic) on objects, called "strings" (these could be real-valued vectors):

  1. Reproduction - combining strings in the population to create a new string (offspring);

    Example: Taking 1st character from 1st parent + rest of string from 2nd parent:

               [001001] + [111111] ===> [011111]   
  2. Mutation - spontaneous alteration of characters in a string;

    Example: Just the left-most string:

                [001001] ===> [101001]   
  3. Crossover - combining strings to exchange values, creating new strings in their place.

    Example: With crossover location at 2:

               [001001]  &  [111111] ===> [001111], [111001]   
These can combine to form hybrid operators, and the reproduction and crossover operations can include competition within populations. Here is a generic GA (strategy):

0. Initialize population.
1. Select parents for reproduction and GA operators (viz., mutation and crossover).
2. Perform operations to generate intermediate population and evaluate their fitness values.
3. Select members of population to remain with new generation.

Repeat 1-3 until some stopping rule is reached.

The original GA (1973, by John Holland) used crossover and total population replacement. This means a population with 2N objects (called chromosomes) form N pairings of parents that produce 2N offsprings. The offsprings comprise the new generation, and they become the total population, replacing their parents. More generally, a population of size N produces an intermediate population of N+M, from which Ñ is kept to form the new population. One way to choose which Ñ survive is by those with the greatest fitness values – survival of the fittest.

Geometric convergence. See convergence.

Geometric mean. Given x in R^n+ and a weight vector, a in the simplex, S_n, the geometric mean is the value:

x(1)^a(1) * x(2)^a(2) * ... * x(n)^a(n)

For example, for a=(.5, .5), the geometric mean of x = (1, 9) is 1*3, which is 3. A fundamental inequality that provides a foundation for geometric programming   is that the geometric mean is never greater than the arithmetic mean:

Prod_j{x(j)^a(j)} <= Sum_j{a(j)x(j)}.

for all x in R^n+ and a in S_n. Further, equality holds only if x(j)=constant for all j.

Geometric program. Min P_0(x): P(x) <= 1 and x >0, where each P_k is a posynomial (k=0,...,m). Conventionally, the monomials are indexed sequentially, so that the posynomials appear as:

P_0 = Sum{c_i Prod_j{x_j ^a(i,j): i=1, ..., N0}

P_1 = Sum{c_i Prod_j{x_j ^a(i,j): i=N0 + 1, ..., N1}

...

P_m = Sum{c_i Prod_j{x_j ^a(i,j): i=N(m-1) + 1, ..., Nm}.

For example, consider:

Min 1/y^3 + 5y/z^2: x, y, z > 0, 1/(xyz) <= 1, x/y + z/x <= 1.

Then, n=3 (variables), m=2 (constraints), N0=2 (monomials in objective), N1=3 (N0 + 1 monomial in constraint 1), and N2=5 (N1 + 2 monomials in constraint 2). The exponent matrix   is the Nm by n matrix whose i-th row contains the exponents of the variables in the i-th monomial. The example has 5 rows and 3 columns:
             .-        -.
             |  0 -3  0 |
             |  0  1 -2 |
         A = | -1 -1 -1 |
             |  1 -1  0 |
             | -1  0  1 |
             |_        _|   
The degree of difficulty is the number of terms in all posynomials (Nm) minus the number of independent variables (rank of exponent matrix) - 1. In the above example, the rank of A is 3, so the degree of difficulty is 5-3-1 = 1. If the last constraint were not present, only the first three rows comprise the exponent matrix, and its rank is 2. In that case, the degree of difficulty is 0. (Using the geometric dual, the solution reduces to solving a system of linear equations, which is what motivates the terminology.)

Also see superconsistent.

Since its inception, the posynomial form has been extended to signomials. In that case, the duality need not be strong, as there can be a duality gap. The general, or signomial, GP is of the form:

min P0(x) - Q0(x): x > 0, P_i(x) - Q_i(x) <= 1 for i=1,...,m,

where P_i and Q_i are posynomials (i=0,...,m).

Global convergence. See convergence.

Global optimization. Generally refers to mathematical programming without convexity assumptions, which are NP-hard. In general, there could be a local optimum that is not a global optimum. Some authors use this term to imply the stronger condition there are multiple local optima. Some solution strategies are given as heuristic search methods (including those that guarantee global convergence, such as branch and bound). As a process associated with algorithm design, some regard this simply as attempts to assure convergence to a global optimum (unlike a purely local optimization procedure, like steepest ascent). See the supplement, by J.D. Pintér.

Glover's linearization. Given a binary variable x and a linear function f(w) in discrete and/or continuous variables w in W for some set W, this linearization reformulates the product f(w)x with a (free) continuous variable z and enforces that z = f(w)x by adding four linear inequalities:

Lx ≤ z ≤ U x, f (w) − U (1 − x) ≤ z ≤ f (w) − L(1 − x),
where the values L and U are defined as
L = min{f(w) : w in WR} and U = max{f(w) : w in WR},
where WR is any relaxation of W.

Global optimum. A solution to a mathematical program (as distinct from a local optimum). When the mathematical program is not convex, there could be (and typically are) many local optima that are not global optima.

Goal program. A goal is neither a constraint nor an objective because it is neither a firm requirement nor a function that must be at an extreme value. In English, a goal represents a relation that is desired, but it may be violated if there are adequate compensations. For example, we may have a budget of $1000, and we prefer to operate within it, but if spending just one more dollar yields a major improvement in operations, we would consider spending $1001.

    This is put into the standard mathematical programming model by two steps:
  1. Add the variable, v, and the constraints, G(x) - v <= 0 and v >= 0, to measure the level of violation. (If the goal were H(x) = 0, instead of G(x) <= 0, the added constraint would be H(x) + u - v = 0, where u and v are (non-negative) levels of violation below and above the goal, respectively.)
  2. Add the penalty term to the objective: P(v), where P(0)=0, and P is strictly increasing -- i.e., v' >= v and v'_i > v_i for some i imply P(v') > P(v).

The resulting mathematical program represents the goal program. If the goal is satisfied, v=0; otherwise, the penalty term, P(v), reflects "adequate compensation" for the violation.

Golden mean (or ratio). The positive solution to the quadratic equation, x^2 + x - 1 = 0, which is (-1+sqrt(5))/2, or approximately .618. This has the proportionality property: x: 1 = (1-x):x, which defines the placements of evaluations for the golden section search.

Golden section search. This finds an interval, [a,b], that contains a maximum of a unimodal function whose domain is [0,1], such that the length of the final interval, [a,b], satisfies: b-a < e (where e > 0 is specified). The golden ratio is denoted below by g (= 0.618).

  1. Initially compute f(g) and f(1-g), and set the interval of uncertainty to the original endpoints, a=0 and b=1. In general, we have x and y placed in [a,b], such that the distances from the endpoints are the same:
                     g(b-a)
                |----------------|
                a------x---------y------b
                       |----------------|
                             g(b-a)
                 
  2. Given the interval [a, b] contains evaluations f(x) and f(y), with x < y, compare: if f(x) >= f(y), replace b with y; if f(x) <= f(y), replace a with x. (If f(x)=f(y), this leaves the interval [x,y], in which case evaluate f(x+g(y-x)).)

    One of the following cases prevail:

                  *                                          *
                            *                     *
           a------x---------y------b       a------x---------y------b
                            : drop :       : drop :
                f(x) > f(y)                           f(x) < f(y)
                  x=g(y-a)                              y=g(b-x)
    
                                 *         *
                          a------x---------y------b
                          : drop :         : drop :
                                 f(x) = f(y)
                 
The golden ratio is the limit of successive fibonacci numbers: (FN-1/FN)–>g. Thus, the golden section search approaches Fibonacci search as the number of functional evaluations (N) becomes large.

Gomory cut. This is an early cutting plane for an integer program. Consider {(x, s): x in Z^n+, s in R^m+, Ax + s = b}, where A and b have integer values. For any y in R^m, let a = yA - |_yA_| and a_0 = yb - |_yb_| (i.e., a is the fractional part of the linear combination of the equations, and a_0 is the fractional part of the same linear combination of the right-hand sides). Also, let c = y - |_y_| (fractional part of y). Then, Gomory's cut is:

ax + cs >= a_0.

y is chosen such that a_0 > 0 and the current solution (with x = 0) violates this inequality in the LP relaxation.

Gomory function. A recursively defined class of functions on Q^n to Q (rational n-vectors to rationals) using three operations: (1) non-negative, rational linear combinations, (2) integer rounding, and (3) maximum. Letting R(v) = integer round-up of v = least integer not less than v, we say f is a Gomory function if it is the identity function, f(x)=x, or it has one of the following forms:

  1. f(x) = ag(x) + bh(x) for some a, b >= 0 and rational;
  2. f(x) = R(g(x));
  3. f(x) = max{g(x), h(x)};
where g and h are Gomory functions. It can be shown this is equivalent to the point-wise maximum of finitely many
Chvátal functions:

f(x) = max{C1(x), ..., Ck(x)},

where each Ci   is a Chvátal function.

This arises in integer linear programming in several ways. One fundamental result is that the optimal value as a function of b is a Gomory function:

f*(b) = inf{cx: Ax = b, x >= 0, x in Zn}.

Here is an example:

Gomory function

Note: The above definition assumes minimization; if the ILP is maximization, R(.) would be round-down (i.e., greatest integer not exceeding the value), and condition 3 would be the point-wise minimum.

Gomory group. The group associated with the corner polyhedron problem:

{v in Z^N+: Sum_j{r(A_j)v_j} = r(b)},

where v corresponds to the nonbasic variables and r(d) is the remainder, d(mod |B|), where |B| is the determinate of the basis, B. The idea is that the nonbasic value chosen in this group, together with the basic levels imputed by the equations Ax=b, yields an integer solution (but a basic level might be negative).

Gradient. Vector of first partial derivatives of a function (assumed to be differentiable at least once). Here this is denoted grad_f(x), where f is the function and x is a point in its domain.

Gradient projection method. A feasible direction method by projecting the gradient into the working surface, {x: Ax=b}. Suppose A has full row rank. Then, P = I - A'Inv(AA')A projects any vector into the null space of A: A[Py] = [A-A]y = 0 for all y in R^n. The form of an iteration is x' = x + sd, where d is the projected gradient, Pgrad_f(x), and s is determined by line search. Since Ad=0, Ax'=Ax, thus staying in the working surface. (This extends to nonlinear constraints by using the same correction procedure as the generalized reduced gradient method.)

Graph. In one context, this is the functional value and domain: {(x, z): x in X and z=f(x)}, where f:X-->R. In another context, this is a (maybe undirected) network. In the former context, see also epigraph and hypograph. In the latter context, the notation [V,E] is sometimes used to mean a graph with vertex (or node) set V and edge (or link) set E. We say an edge is incident with the two nodes that define it, and those two nodes are called the endpoints of the edge. The endpoints are said to be adjacent nodes.

An edge whose endpoints are the same node is called a loop. Two edges with the same endpoints are called parallel. A graph is simple if it has no loops or parallel edges. Otherwise, it is called a multigraph. The degree of a node is the number of edges incident to it (counting a loop twice). If two edges have a common endpoint, they are said to be adjacent. A path is a sequence of edges that are successively adjacent. (If the graph is simple, a path can also be represented by the associated sequence of nodes.)

When the edge is directed (or oriented), it is called an arc. When all edges are directed, the graph is called a directed graph, or digraph. With additional properties (e.g., arc weights), this is a network. A path in a digraph can traverse its arcs in either the forward or backward direction. If all arcs are forward, it is a directed path, or dipath.

    Here are some special graphs of interest (not exhaustive).
  • bipartite. nodes decompose into 2 sets such that every link has its endpoints in opposite sets. This applies, for example, to the standard transportation problem, where there are sources in one set and destinations in the other.
  • complete. There is a link between each pair of nodes. A complete graph on n nodes is usually denoted K_n. In the case of a bipartite graph, the term bicomplete is used to mean all possible links are there: between all pairs of nodes in the two sets. A bicomplete graph on sets with m and n nodes, resp., is usually denoted K_mn.
  • connected. There is a path between each pair of nodes. If the graph is directed, this divides into:
    1. strongly connected - arcs must be traversed in forward direction;
    2. weakly connected - arcs can be traversed in either direction.
  • cycle (denoted C_n for n nodes). The links are (1,2), (2,3), ..., (n-1,n), (n,1).
  • planar. The graph can be drawn in a plane (2-D) such that no two links cross (they meet only at incident nodes). This has special properties in multi-commodity flows and shortest paths.
  • subgraph. [V',E'] such that V' is a subset of V and E' is a subset of E.
  • tree. Connected and acyclic. One problem is finding a spanning tree that is optimal in some sense.
  • undirected. No link is directed.

GRASP. This stands for Greedy Randomized Adaptive Search Procedures. It is a metaheuristic that uses restart to search many portions of the solution space. Once it is started, some Heuristic is employed to reach a local optimum. The choice for the local search can be very problem-specific, such as using an n-Opt neighborhood for the TSP, or it can use another metaheuristic, like simulated annealing or tabu search.

Greedy algorithm. Applies when the optimization problem is to decide whether or not to include some element from a given set. A greedy algorithm begins with no elements and sequentially selects an element from the feasible set of remaining elements by myopic optimization. (The elements could have been sorted by some criterion, such as associated weights.) This results in an optimal solution to the problem if, and only if, there is an underlying matroid structure (for example, see spanning tree). Further details are in the supplement, Greedy Algorithms.

Gröbner basis. This arises in parametric   integer (linear) programming, varying the right-hand side (b). A precise definition is involved, stemming from ideal theory in abstract algebra, and there are many other applications outside of integer programming. In the context of ILP, a Gröbner basis is a minimal set of change directions, {h1, ..., hk}, called a test set, such that the objective function improves along those directions. To elaborate, suppose we have

ILP(b):   min cx:   Ax = b,   x in Zn+,

where A and b have integer elements. Let X*(b) = argmin{cx: Ax = b, x in Zn+} and B = {b: X*(b) not empty} (i.e., ILP(b) is feasible and bounded). A Gröbner basis for A, holding c fixed, is a set, Gc = {hi}, for which the following conditions hold:

  1. Ah = 0.
  2. ch < 0 (for minimization).
  3. For b in B and x not in X*(b), there exists h in Gc such that x + h >= 0.

Condition 1 says that A(x+h)=b if Ax=b, and condition 2 says c(x+h) < cx. Therefore, if x is any feasible solution to ILP(b), either x is optimal for a particular right-hand side b, or condition 3 ensures there is an improvement by adding some vector, h in Gc.

Here is a numerical example.

Group problem. Solving the corner polyhedron problem from the vantage of the Gomory group.

 


Notation

Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2008, Mathematical Programming Glossary, by the INFORMS Computing Society