Mathematical Programming Glossary - R

Randomized program. A randomized policy is a distribution over the policy set, X, that describes a policy as a random variable. This is not restricted to stochastic programs. The randomized form is called a randomized program, and it is the following semi-infinite program in response space:

Max Sum_x{w(x)f(x)}: w in W, Sum_x{w(x)g(x)} <= 0, Sum_x{w(x)h(x)} = 0,

where W = {w: X-->[0, 1]: w(x) > 0 for finitely many x in X, and Sum_x{w(x)}=1}.

In this form, the randomized program is an ordinary linear program if X is finite. More generally, the definition of W renders the summations well defined. One could interpret w as a randomized policy: use x with probability w(x). A pure strategy is when w(x*)=1 for some x* (so w(x)=0 for x not= x*); otherwise, w is called a mixed strategy.

One key fact is that the solutions to the original mathematical program (in standard form) correspond to pure strategies in this randomized form. Further, a key to the Lagrangian Strong Duality Theorem is that every mixed strategy is dominated by a pure strategy. Moreover, this underlies the Generalized Lagrange Multiplier method, and there is no loss in optimality to restrict mixed strategies to satisfy the Haar condition: |{x: w(x) > 0}| <= m+1.

Range of Compatibility. See Compatibility theory.

Range constraint. A constraint of the form a <= g(x) <= b.

Rank-one correction. This is an update to an approximation of a matrix, notably to the hessian inverse, by adding a matrix of rank 1 to it. This generally takes the form: H'=H + vv', where v is a nonzero column vector (so vv' is a rank 1 matrix).

Rank-two correction. This is an update to an approximation of a matrix, notably to the hessian inverse, by adding a matrix of rank 2 to it. This generally takes the form of two rank-one updates: H' = H + vv' + uu', where v and u are linearly independent column vectors (so [vv'+uu'] is a rank 2 matrix).

Rate of convergence. See convergence.

Rates of substitution. Arises when equations are split into dependent (or basic) variables and independent (or nonbasic) variables. In linear programming, we rewrite Ax=b as Bu + Nv = b, where u is the vector of basic variables and v is the vector of nonbasics. Then, the original equations are equivalent to u = b' + Mv, where b' = B–1b and M = -B–1N. This implies that the rate of substitution between u_i and v_j is M(i, j) because it describes the marginal rate at which u_i must change in response to a change in v_j to maintain the equations with all other v's held fixed.

Ray. A translated half-line: {x + th: t >= 0}, where h is a recession direction. We call x the root, and we say the ray is rooted at x. (Also see extreme ray.)

Recession cone (also called characteristic cone), of a given set, S. The set of recession directions for S: {h: {x+th: t >= 0} is in S for some x in S}. For example, let rc(S) denote the recession cone of S. Then, rc(S)=S if S=R^n or if S={0}. If S = {x: Ax <= b}, rc(S) = {h: Ah <= 0}.

Recession direction. At x in S, this is a nonzero vector, h, for which the associated ray is contained in S. If h is a recession direction for some x in S, it is called a recession direction for S.

Recourse model. See stochastic program.

Reduce. Same as presolve.

Reduced cost. In linear programming, it is the sum of the direct cost (c_j) of a nonbasic variable (j) and the indirect costs from induced changes in the levels of the basic variables (to satisfy the equations). For a dual price vector, p, the reduced cost vector is is c - pA.

Reduced gradient method. This applies to the case of linear constraints: Max{f(x): Ax = b, x >= 0}, where rank(A)=m and f is in C^2. The variables are partitioned into x = (v,w), with the corresponding partition of A = [B C], such that the mathematical program is equivalent to:

Max{f(v, w): Bv + Cw = b, (v, w) >= 0}.

The method assumes B is nonsingular (i.e., only variables with linearly independent columns in A can be grouped), and the nondegeneracy assumption: v > 0. Now dw is the independent direction of change, and dv = -B–1[C dw], thus keeping x+dx = (v+dv, w+dw) on the hyperplanes -- i.e., A[x+dx] = Ax + A[dx] = b + 0 = b.

The method considers a direction for the independent variables, which then determines the direction for the dependent variables. In particular, dw is chosen by evaluating the gradient of the objective: f(B–1[b-Cw,w]). This gradient (with respect to w) is called the reduced gradient:

r = grad_w[f(x)] – grad_v[f(x)]B–1C

at x = (v, w). Then, dw = r completes the first part of the iteration. The second part is to select a step size, which can be blocked by the non-negativity of v. The resulting change can cause the working set to change such that the partition changes.

(Also see generalized reduced gradient method.)

Redundant constraint. A constraint whose removal does not change the feasible region. Suppose g_i(x) <= 0 is redundant and S_i denotes the feasibility region without this constraint (so x in S_i implies g_i(x) <= 0). Then, the constraint is strongly redundant if x in S_i implies g_i(x) < 0; it is weakly redundant if it is redundant and g_i(x) = 0 for some x in S_i.

Refinery problems. There are a myriad of these, including fluid blending problems, particularly the pooling problem, to determine a least costly operation to satisfy demands for products.

Reformulation. Obtaining a new formulation of a problem that is in some sense better, but equivalent to a given formulation. For example, consider the packing constraint: at most 1 element can be selected from {1,...,n}. Letting x_j be the associated binary variable, the following formulations have the same feasibility region:

  1. x_i + x_j <= 1 for all i < j, j=2,...,n;
  2. x_1 + x_2 + ... + x_n <= 1.

Formulation 2 is better because its LP relaxation is sharper: the relaxed region for 2 is a proper subset of the relaxed region for 1.

Reformulation-Linearization Technique (RLT). The Reformulation-linearization-technique (rlt) is a general methodology for recasting difficult optimization problems into higher-dimensional spaces for the purpose of obtaining tight bounds. As suggested by the name, it consists of the two basic steps of reformulation and linearization. Given a mixed 0-1 linear program in n binary variables, the reformulation step creates additional redundant nonlinear constraints by multiplying the constraints by product factors of the binary variables x and their complements (1-x), and subsequently enforces the idempotent identity that x^2=x. The linearization step then substitutes a continuous variable for each distinct product of variables. A hierarchy of relaxations result, depending on the form of the product factors employed. An explicit algebraic characterization of the convex hull is available at the highest level, level-n. The method is also applicable to mixed 0-1 polynomial and to continuous, nonconvex programs. Furthermore, it can be applied to general discrete optimization problems by using product factors of Lagrange interpolating polynomials.

Regular point. This pertains to a mathematical program whose functions are in C^1, and the issue is whether the Lagrange Multiplier Rule is valid. A regular point is a point that satisfies some constraint qualification, but some authors are more specific and require the Lagrange constraint qualification:

Let G(x) denote the matrix whose rows are the gradients of all active constraints. Then, x is a regular point if G(x) has full row rank.
This gives additional properties (e.g., see the tangent plane).

In this context, a regular point is also called Lagrange regular. The mathematical program is [Lagrange] regular if every feasible point is [Lagrange] regular.

Relative interior. The interior of a set when considered in the smallest subspace containing it. In particular, the polyhedron, {x: Ax >= b}, can be defined as the intersection of the inequalities that are forced, say {x: Qx=q}, and the others, say {x: Px >= p} (so A=[P Q] and b=[p q]). Then, the relative interior of the original polyhedron is {x: Qx=q and Px > p}.

Relaxation. P' is a relaxation of P if: (1) the feasible region of P' contains the feasible region of P, and (2) the objective value in P', say F(x), is no worse than that of P, say f(x), for all x in the domain of P (for maximization, this means F(x) >= f(x) for all x in X).

    Some of the most common are as follows:
  1. dropping integer restrictions (viz., we refer to the LP relaxation of a (linear) MIP);
  2. dropping some constraints (viz., an active set method).
  3. aggregating constraints (viz., using surrogate constraint);
  4. using duality (viz., GLM).

One relaxation is sharper than another if its objective value is never further from the original. One way this can occur is for the objectives to be the same and the feasibility region to be less. In particular, one LP relaxation of a MIP is sharper than another if its feasible region is contained in the other.

Reoptimization. A term sometimes used to mean a mathematical program is solved after some parameter values have been changed. The idea is to use the original solution, and perhaps some information generated by the algorithm, to solve the related problem more efficiently than starting from scratch. An example is using an advanced basis to begin the simplex method.

Reproduction operation. See genetic algorithm.

Requirements space. Originally coined in linear programming, this represents the set of non-negative linear combinations of the columns of the matrix (A): {Ax: x >= 0}. For an LP in standard form, the requirements space is the set of feasible right-hand sides, and it is a convex, polyhedral cone.

Residuals. The difference, h(x)-b, when seeking a solution to the system of equations, h(x)=b. In particular, if we seek to solve Ax=b, and x is some candidate solution, reached by an algorithm, the residual is r=Ax-b.

Response space. The set of [sub-]range values for the functions of a mathematical program in standard form:

{(b, c, z): g(x) <= b, h(x)=c and f(x) >= z for some x in X}.

(Examples)

Restricted basis entry rule. This is a restriction on which variables can enter the basis in a simplex method. A common rule arises in separable programming, which uses specially ordered sets: a group of non-negative variables must sum to 1 such that at most two variables are positive, and if two are positive, they must be adjacent. For example, suppose the variables are (x1,x2,x3). Then, it is feasible to have (.5,.5,0) and (0,.2,.8), but it is not feasible to have (.5,0,.5) or (.2,.2,.6). In this case the rule is not to permit a variable to enter the basis unless it can do so without violating the adjacency requirement. For example, if x1 is currently basic, x3 would not be considered for entry.

Another restricted entry rule pertains to the delta form of separable programming (plus other applications): Do not admit a variable into the basis unless its predecessor variables are at their upper bound. This means there is an ordered set of bounded variables, say (x1,x2,...,xn) such that 0 <= x <= (U1,...,Un). Then, xk is not considered for basis entry unless xj=Uj for all j < k.

Reverse convex constraint. g(x) >= 0, where g is a convex function.

Reverse convex program. Minimizing a concave function, usually on a polyhedron. If the feasibility region is convex (as in the case of a polyhedron), non-empty and compact, it has an extreme point that is optimal.

Right-hand side (RHS). When considering constraints of the form g(x) <= b and h(x)=c, the vector (b,c) is called the right-hand side.

Rim data. In a linear program, the data values are (A, b, c) plus (possibly) bounds, say L <= x <= U. The A matrix pertains to row-column information, whereas all bounds (L, U), right-hand sides (b), and cost coefficients (c) pertain to a column or a row. The latter are called rim data elements because of their position in the following schematic:

                    column name
       ========================|
        obj name   |     c     |...Min or Max
       ========================|
        row name   |     A     | = b
       ========================|
        Lower bound|     L     |
        Upper bound|     U     |
       =========================    

Robust optimization. A term given to an approach to deal with uncertainty, similar to the recourse model of stochastic programming, except that feasibility for all possible realizations (called scenarios) is replaced by a penalty function in the objective. As such, the approach integrates goal programming with a scenario-based description of problem data. To illustrate, consider the LP:

Min cx + dy: Ax=b, Bx + Cy = e, x, y >= 0,

where d, B, C and e are random variables with possible realizations {(d(s), B(s), C(s), e(s): s in {1,...,N}} (N = number of scenarios). The robust optimization model for this LP is:

Min f(x, y(1), ..., y(N)) + wP(z(1), ..., z(N)): Ax=b, x >= 0,

B(s)x + C(s)y(s) + z(s) = e(s), and y(s) >= 0, for all s = 1,...,N,

where f is a function that measures the cost of the policy, P is a penalty function, and w > 0 (a parameter to trade off the cost of infeasibility). One example of f is the expected value: f(x, y) = cx + Sum_s{d(s)y(s)p(s)}, where p(s) = probability of scenario s. In a worst-case model, f(x,y) = Max_s{d(s)y(s)}. The penalty function is defined to be zero if (x, y) is feasible (for all scenarios) -- i.e., P(0)=0. In addition, P satisfies a form of monotonicity: worse violations incur greater penalty. This often has the form P(z) = U(z^+) + V(-z^-) -- i.e., the "up" and "down" penalties, where U and V are strictly increasing   functions.

The above makes robust optimization similar (at least in the model) to a goal program. Recently, the robust optimization community defines it differently – it optimizes for the worst-case scenario. Let the uncertain MP be given by

min f(x; s): x in X(s),

where S is some set of scenarios (like parameter values). The robust optimization model (according to this more recent definition) is:

minx {maxs in S f(x; s)} x in X(t) for all t in S,

The policy (x) is required to be feasible no matter what parameter value (scenario) occurs; hence, it is requied to be in the intersection of all possible X(s). The inner maximization yields the worst possible objective value among all scenarios. There are variations, such as "adjustability" (i.e., recourse).

Rosenbrock function. The original form has 2 variables:

f(x1, x2) = 100(x2 - x12)2 + (1 - x1)2

The significance of this function is that it has "bannana-shaped" contours, making it difficult for nonlinear programming algorithms, particularly
Cauchy's steepest descent, to solve it. The (global) minimum is at x* = (1, 1) with f(x*) = 0. There are variations that extend Rosenbrock's function, many by Schittkowski, such as the following on Rn:

f(x) = Sum1 <= j <= n/2 {100(x2j - x2j-12)2 + (1 - x2j-1)2 }.

Rosen's decomposition. Essentially the dual of Dantzig-Wolfe decomposition, this applies to a linear program with linking activities:

           .----.                  .-----.
           | B1 |                  | L A |
           |____|                  | i c |
                 .----.            | n t |
                 | B2 |            | k i |
                 |____|            | i v |
                       .           | n i |
                         .         | g t |
                           .       |   i |
                            .----. |   e |
                            | Bk | |   s |
                            |____| |_____|    

Routing problems. Finding a path or cycle in a network. An easy routing problem is the shortest path; a hard one is the TSP. One prevalent class, with many variations, is vehicle routing.