Note: This version of the glossary is obsolete and is not maintained.
Please use

Mathematical Programming Glossary - A
Mathematical Programming Glossary - S
Saddle point. Let f:X × Y –> R. Then, (x*, y*) is a saddle point of f if x* minimizes f(x, y*) on X, and y* maximizes f(x*, y) on Y. Equivalently,

f(x*, y) <= f(x*, y*) <= f(x, y*) for all x in X, y in Y.

von Neumann (1928) proved this equivalent to:

Inf{Sup{f(x, y): y in Y}: x in X} = Sup{Inf{f(x, y): x in X}: y in Y} = f(x*, y*).

A sufficient condition for a saddle point to exist is that X and Y are non-empty, compact, convex sets, f(.,y) is convex on X for each y in Y, and f(x,.) is concave on Y for each x in X. Saddle point equivalence underlies duality.

Satisfiability problem (SAT). Find a truth assignment to logical propositions such that a (given) collection of clauses is true (or ascertain that at least one clause must be false in every truth assignment). This fundamental problem in computational logic forms the foundation for NP-completeness. When SAT is in CNF and may not be satisfiable, we define the problem of maximizing the number of true clauses as MAXSAT.

Scaling. Changing the units of measurement, usually for the numerical stability of an algorithm. The variables are transformed as x' = Sx, where S=diag(sj). The diagonal elements are the scale values, which are positive: s1..., sn > 0. Constraint function values can also be scaled. For example, in an LP, the constraints Ax=b, can be scaled by RAx=Rb, where R=diag(ri) such that r > 0. (This affects the dual values.) Some LP scaling methods simply scale each column of A by dividing by its greatest magnitude (null columns are identified and removed).

Example A column scaling A row scaling
10x + 100y = 500
–30x + .3y = .2
.3333x + y = 500
–x + .003y = .2
x + 10y = 50
–300x + 3y = 2

Another method is logarithmic scaling, which scales by the logarithm of the greatest magnitude. More sophisitcated methods are algorithmic, taking both row and column extremes into account.

Scatter search. This is a population-based metaheuristic, starting with a collection of reference points, usually obtained by the application of some heuristic. A new point is created by taking combinations of points in the population, and rounding elements that must be integer-valued. It bears some relation to a genetic algorithm, except that scatter search uses linear combinations of the population, while the GA ``crossover operation'' can be non-linear.

Scheduling (e.g., jobs). A schedule for a sequence of jobs, say j1,..., jn, is a specification of start times, say t1,...,tn, such that certain constraints are met. A schedule is sought that minimizes cost and/or some measure of time, like the overall project completion time (when the last job is finished) or the tardy time (amount by which the completion time exceeds a given deadline). There are precedence constraints, such as in the construction industry, where a wall cannot be erected until the foundation is laid.

There is a variety of scheduling heuristics. Two of these for scheduling jobs on machines are list heuristics: the Shortest Processing Time (SPT), and the Longest Processing Time (LPT). These rules put jobs on the list in non-decreasing and non-increasing order of processing time, respectively.

Other scheduling problems, which might not involve sequencing jobs, arise in production planning.

Search tree. The tree formed by a branch and bound algorithm strategy. It is a tree because at each (forward) branching step the problem is partitioned into a disjunction. A common one is to dichotomize the value of some variable, x <=v or x >=v+1. This creates two nodes from the parent:

              [parent node]
                  / \
          x <= v /   \ x >= v+1
                /     \
      [left child]   [right child] 

Secant method. A method to find a root of a univariate function, say F. The iterate is

                          F(x^k) [x^k - x^(k-1)]
         x^(k+1) = x^k - ------------------------. 
                           F(x^k) - F(x^(k-1))      
If F is in C^2 and F''(x) not= 0, the order of convergence is the golden mean, say g (approx.= 1.618), and the limiting ratio is:
               | 2 F'(x) |(g-1)
               | ------- |
               | F''(x)  |        

Second-order conditions. This descends from classical optimization, using the second-order term in Taylor's expansion. For unconstrained optimization, the second-order necessary condition (for f in C^2) is that the hessian is negative semi-definite (for a max). Second-order sufficient conditions are the first-order conditions plus the hessian be negative definite. For constrained optimization, the second-order conditions are similar, using projection for a regular mathematical program and the Lagrange Multiplier Rule. They are as follows (all functions are in C^2, and the mathematical program is in standard form, for x* a local maximum):

Second-order necessary conditions. There exist Lagrange multipliers, (u,v), such that u >= 0 and ug(x*)=0 for which: (1) grad_x[L(x*,u,v)]=0, and (2) H_x[L(x*,u,v)] is negative semi-definite on the tangent plane.

Second-order sufficient conditions. The above necessary conditions hold but with (2) replaced by (2') H_x[L(x*,u,v)] is negative definite on the tangent plane.

Self concordance. Properties of a function that yield nice performance of Newton's method used for line search when optimizing a barrier function. Specifically, let B be a barrier function for S = {x in X: g(x) <= 0} with strict interior S^0. Let x be in S and let d be a direction vector in R^n such that the line segment [x-td, x+td] is in S for t in [0, t*], where t* > 0. Then, define F:[0,t*]–>R by:

F(t) = B(x + td)

(while noting that F depends on x and d). The function F is self-concordant if it is convex in C3 and satisfies the following for all x and d:

|F'''(0)| <= 2F''(0)3/2.

One calls F k-self-concordant in an open convex domain if

|F'''(0)| <= 2k F''(0)3/2.

The logarithmic barrier function, associated with linear programming, is self-concordant with k=1. This further extends naturally to functions in Rn.

Semi-assignment problem. This is like the assignment problem, except only the rows or the columns (not both) of the assignment matrix is constrained to equal 1. With linear objective, the problem is:

Min sum i,j {C(i, j) X(i, j)}: X >= 0,   sumi X(i, j) = 1 for all j.

(The sum constraint could be over j, for all i.) An example is to assign jobs to people, but allow some jobs to be assigned to more than one person and some jobs not assigned at all. A more realistic example arises in biology. We are given a rotamer library of side chain confirmations, and each amino acid residue in a protein must be assigned some rotamer. The same rotamer could be assigned to more than one site, and some rotamers need not be assigned at all.

Semi-definite program. Min{cx: S(x) in P}, where P is the class of positive semi-definite matrices, and S(x) = S_0 + Sum_j{x(j)S_j}, where each S_j, for j=0,...,n is a (given) symmetric matrix. This includes the linear program as a special case.

Semi-infinite program. A mathematical program with a finite number of variables or constraints, but an infinite number of constraints or variables, respectively. The randomized program is a semi-infinite program because it has an infinite number of variables when X is not finite.

Sensitivity analysis. The concern with how the solution changes if some changes are made in either the data or in some of the solution values (by fixing their value). Marginal analysis is concerned with the effects of small perturbations, maybe measurable by derivatives. Parametric analysis is concerned with larger changes in parameter values that affect the data in the mathematical program, such as a cost coefficient or resource limit.

Under suitable assumptions, the multipliers in the Lagrange Multiplier Rule provide derivatives of the optimal response function – i.e., under certain conditions, (u, v) = grad_f*(b, c). For special approaches in LP, see compatibility theory, the 100% Rule, and the tolerance approach.

Separable program. The functions are separable: f(x) = Sum_j{f_j(x_j)}, g(x) = Sum_j{g_j(x_j)}, and h(x) = Sum_j{h_j(x_j)}. The classical (LP) approaches to separable programming are called lambda-form and delta-form, both using piece-wise linear approximations.

Let {x^k} be a specified set of points, where x^k = [x(k,1), x(k,2), ..., x(k,n)], and let y={y(k,j)} be decision variables that are the coefficients of convex combinations, giving the following linear program:

Max Sum_kj{y(k, j) f_j(x(k, j))}: y >= 0, Sum_k{y(k, j)} = 1 for each j,

Sum_kj{y(k, j) g_j(x(k, j))} <= 0, Sum_kj{y(k, j) h_j(x(k, j))} = 0.

A restricted basis entry rule is invoked during the simplex method to yield an approximate solution. (However, this is dominated by the Generalized Lagrange Multiplier method, which can be viewed as generating the approximating breakpoints posteriori, getting successively finer near the solution.)

The delta form uses the differences: u(k, j) = x(k, j) - x(k-1, j). The associated functional differences are:

Df(k, j) = f_j(x(k, j)) - f_j(x(k-1, j));

Dg(k, j) = g_j(x(k, j)) - g_j(x(k-1, j));

Dh(k, j) = h_j(x(k, j)) - h_j(x(k-1, j)).

Then, the approximating LP is:

Max Sum_kj{Df(k, j) u(k, j)}: 0 <= u <= 1;

Sum_kj{Dg(k, j) u(k, j)} <= b, Sum_kj{Dh(k, j) u(k, j)} = c,

where b = -Sum_j{g_j(x(0, j))} and c = -Sum_j{h_j(x(0, j))} (a similar constant was dropped from the objective). Another restricted basis rule is invoked: u(k,j) > 0 implies u(k,q)=1 for all q < j and all k.

Separating hyperplane. A hyperplane for which two (given) sets lie in opposite halfspaces. The separation is strict if the two sets are contained in their respective open halfspace.

              ____   |   ____
             |    |  |   \  /
             | S1 |  |   /S2\
             |____|  |   \__/ 
                   hyperplane (line) 
Here is a graphic.

Sequencing problems. Finding an ordering, or permutation, of a finite collection of objects, like jobs, that satisfies certain conditions, such as precedence constraints.

Sequential decision process. See dynamic programming and Time-staged models.

Sequential Linear Programming (SLP). Solving a nonlinear program by a sequence of linear approximations and using linear programming to solve each one. The linear approximations are usually done by using the first-order Taylor expansion.

Sequential Quadratic Programming (SQP). Solving a nonlinear program by a sequence of quadratic approximations and using quadratic programming to solve each one. The approximations are usually done by using the second-order Taylor expansion.

Sequential Unconstrained Minimization Technique (SUMT). This is the penalty function approach.

Set covering problem. See covering problem.

Shadow price. An economic term to denote the rate at which the optimal value changes with respect to a change in some right-hand side that represents a resource supply or demand requirement. (This is sometimes taken as synonymous with the dual price, but this can be erroneous, as in the presence of degeneracy – see marginal price.) Also see pricing.

Sherman-Morrison formula. This is a useful identity:

                              A–1 ab' A–1
          [A + ab']–1 = A–1 – -----------,
                              1 + b'A–1a    
where A is a nonsingular n×n matrix and a and b are n-vectors, provided the inverse (on the left) exists.

Shortest path. In a graph or network, this is a path from one node to another whose total cost is the least among all such paths. The "cost" is usually the sum of the arc costs, but it could be another function (e.g., the product for a reliability problem, or max for a fuzzy measure of risk). There are some particular labeling algorithms given.

Signomial. The difference between two posynomials. This class of function defines the general geometric program.

Simplex (pl. simplices). {x in R: Sum{xj} = 1}. For n=1, this is a point (x=1). For n=2, this is a line segment, joining points (1,0) and (0,1). For n=3, this is a triangle, joining the vertices (1,0,0), (0,1,0), and (0,0,1). This is sometimes called an n-simplex, denoted by Sn (note its dimension is n-1). The open simplex excludes the axes: {x in Sn: x > 0}.

More generally, some authors define a simplex to be the convex hull of any n+1 affinely independent vectors, and refer to the special case of the unit vectors as the standard simplex.

Simplex method. An algorithm invented to solve a linear program by progressing from one extreme point of the feasible polyhedron to an adjacent one. The method is an algorithm strategy, where some of the tactics include pricing and pivot selection.

The elementary simplex method is the name of Dantzig's original (1947) algorithm, with the following rules applied to the standard form: Min{cx: Ax=b, x >= 0}.

    Let dj = reduced cost of xj; terminate if dj >= 0 for all j.
  1. Select dj < 0 as one of greatest magnitude.
  2. In the associated column (j) of the tableau, compute the min ratio: xi / a(i, j): a(i, j) > 0. (If a(., j) <= 0, LP is unbounded).
  3. Enter xj into the basic set, in exchange for xi, and update the tableau.

Among the variations are:

The revised simplex method is the use of a particular factored form of the basis: B = [E_1 E_2 ... E_k] (after k iterations), where each Ei is an elementary matrix. Then, the revised simplex method uses forward transformation to pivot and backward transformation to update the pricing vector. (For this distinction, the elementary simplex method is sometimes called the tableau method.)

The simplex method draws its name from imagining a normalization constraint, Sumj{xj} = 1, and thinking of the j-th column of A to be selected by the weight xj in S_w. Then, at an iteration, an m-simplex is specified by the basic variables, and an adjacent simplex is chosen to improve the objective value. This view is in requirements space.

(For a different algorithm, used as a heuristic in NLP, see the Nelder-Mead simplex method.)

Simplex multiplier. See pricing.

Simplicial subdivision. Given a simplex, S, its simplicial subdivision is a collection of simplices, say {Ti} such that \/i{T_i} = S and for any i and j, either the intersection T_i/\Tj is empty or equals the closure of a common face. The mesh of the subdivision is the diameter of the largest sub-simplex. This arises in a fixed point approach to compute an economic equilibrium.

Simulated annealing. An algorithm for solving hard problems, notably combinatorial programs, based on the metaphor of how annealing works: reach a minimum energy state upon cooling a substance, but not too quickly in order to avoid reaching an undesirable final state. As a heuristic search, it allows a non-improving move to a neighbor with a probability that decreases over time. The rate of this decrease is determined by the cooling schedule, often just a parameter used in an exponential decay (in keeping with the thermodynamic metaphor). With some (mild) assumptions about the cooling schedule, this will converge in probability to a global optimum.

Skew symmetric matrix (A). A is square and A' = -A.

Slack variable. In an inequality constraint of the form g(x) <= b, the slack is b-g(x), which is designated by the slack variable, s. Then, the original constraint is equivalent to the defining equation, g(x) + s = b, plus s >= 0.

Slater's (interiority) condition. Originally for the purely inequality system with g convex, it means there exists x for which g(x) < 0. More generally, for a mathematical program in standard form, it means there exists x in X for which g(x) < 0 and h(x) = 0.

Spanning tree [problem]. A subgraph that is a tree containing all nodes. The max weight spanning tree problem is to find a spanning tree such that the sum of (given, positive) weights of the edges is a maximum.

    The max spanning tree problem is solvable by the following greedy algorithm:
    Input. connected graph with weights, w1 >= ...>= wm.
    Output. maximum weight spanning tree, T.
  • Initialization: Set k=1; T = graph with given nodes and no edges.
  • Iteration (until k=m-1): Test if the k-th edge forms a cycle with T. If not, add it to T; if so, discard the edge. In either case, increment k and iterate.
Here is a supplement, Greedy Algorithms for Minimum Spanning Tree.

Sparsity. The fraction of zeroes in a matrix. If A is m by n, and A(i, j) ≠ 0 for k of its elements, its sparsity is k/(mn). Large linear programs tend to be very sparse, increasingly so as the dimensions grow. For example, consider the standard transportation problem with s sources and d destinations. This has m = (s+d) constraints and n = sd variables. Each column, however, has exactly 2 nonzeroes since A is the incidence matrix of the network, so its sparsity is 2n/(mn) = 2/m, which decreases as the number of sources and/or destinations grows large.

The sparsity of a simple graph (or network) is the sparsity of its adjacency matrix. More generally, the sparsity of a multigraph refers to the average degree of its nodes.

See super sparsity for extensions of this idea.

Specially ordered set (SOS). These are sets of non-negative variables that are required to sum to 1. For computational efficiency, it is sometimes better to define these sets by some marking data structure, rather than include them along with other equality constraints. There are two types of SOSs, distinguished by what they represent. A Type 1 SOS is when each variable is binary, so the constraint is one of multiple choice. A Type 2 SOS is when a restricted basis entry rule is used, as in the lambda-form of separable programming.

Spectral radius of a matrix, A. The radius of a disk that contains the spectrum: r(A) = Max{|y|: y is an eigenvalue of A}.

Spectrum of a matrix, A. Its set of eigenvalues.

Speed of convergence. See convergence.

Stability region. The set of parameter values for which an optimal solution remains optimal. This arises naturally when solving combinatorial programs, where a solution is often a subgraph, such as a tree, and the question is for what range of arc weights is this subgraph optimal (such as a spanning tree that is minimum for given weights). More generally, x could be a solution generated by some algorithm, A, from an initial value x0. Then, suppose the feasibility region depends on the parameter p, say F(p), and the objective also depends on p, say f(x;p). Let X(p,A,x0) denote the generated solution from algorithm A, starting at x0, with parameter value p. Let the parameter set be P (which includes p*). The stability region of x* = X(p*,A,x0) is {p in P: x* = X(p,A,x0)}. The algorithm may be a heuristic, so x* need not be optimal. For example, one could use an n-Opt heuristic for the travelling salesman problem, so x represents a tour. The parameters could be the costs, or they could be the location of each point in a Euclidean TSP. The stability region is the set of costs, or coordinates in the plane, for which the tour generated by n-Opt is the same.

Stable mathematical program. Roughly, one whose solution does not change much under perturbation. For the inequality case, we have the following

    Stability conditions:
  1. {x in X: g(x) <= b} is bounded for some b > 0.
  2. cl{x in X: g(x) < 0} = {x in X: g(x) <= 0}.

The first stability condition pertains to upper semi-continuity and the second, called the closure condition, pertains to lower semi-continuity.

    The conditions are not only sufficient to ensure the respective semi-continuity, but they are necessary when:
  1. {x in X: g(x) <= 0} is bounded.
  2. {x in X: g(x) < 0} is not empty.

Stable set. Same as independent set

Standard linearization. The standard way to linearize the product of two binary variables x and y is to replace xy with a continuous variable w and add four linear inequalities as auxiliary constraints:

w <= x, w <= y, w >= x + y - 1, and w >= 0.
Collectively, these imply w = xy for all binary values of x and y. This can be generalized for the product of binary variables x_j for all j in some index set J by replacing prod_{j in J} x_j with a continuous variable w and adding |J| + 2 auxiliary constraints:
w <= x_j for all j in J, w >= sum_{j in J} x_j - (|J| - 1), and w >= 0.

Stationary point. Usually this is used to mean a Kuhn-Tucker point, which specializes to one for which grad_f(x)=0 if the mathematical program is unconstrained. In the context of an algorithm, it is a fixed point.

Stationary policy. In a dynamic program, this is a policy that is independent of time – i.e., x*(s,t) = T(s) (some function of state, but not of time, t).

Steel beam assortment problem. A steel corporation manufactures structured beams of a standard length, but a variety of strengths. There is a known demand of each type of strength, but a stronger one may fulfill demand (or part thereof) for another beam (but not conversely). The manufacture of each type of steel beam involves a fixed charge for its setup. In addition, there is a shipping cost proportional to the difference in the demanded strength and the actual strength, and proportional to the quantity shipped.

Let N = number of varieties of strengths D(t) = demand for beam of strength s_t (where s_1 >= s_2 >= ... >= s_N) x(t) = amount of beams of strength s_t manufactured p_t(x(t)) = manufacturing cost of x(t) units of beam of strength s_t (incl. fixed charge) y(t) = total excess of beams of strength s_1, ..., s_t before fulfilling demand D(t+1), ..., D(N) h_t(y(t)) = shipping cost (= c[s_(t+1)-s_t]Min{y(t),D(t)}.

Although t does not index time, the mathematical program for this problem is the same form as the production scheduling problem, using the inventory balance equations to relate y and x. This is valid because s_1 >= s_2 >= ... >= s_N implies y(t) can be used to fulfill demand D(t+1) + D(t+2) + ... + D(N). (Also, note that here h_t is not a "holding cost".)

Steepest ascent (descent, if minimizing). This is a class of algorithms, where x' = x + sd, such that the direction vector d is chosen by maximizing the initial "velocity" of change, and the step size (s) is chosen by line search. Generally used in the context of unconstrained optimization, the mathematical program is Max{f(x): x in R^n}, where f is in C^1. (For descent, change Max to Min.) Then, d is chosen to maximize the first-order Taylor approximation, subject to a normalization constraint: Max{grad_f(x)d: ||d||=1}, where ||d|| denotes the norm of the direction vector, d. When the Euclidean norm is used, this yields the original steepest ascent algorithm by Cauchy, which moves in the direction of the gradient:

d = grad_f(x) / ||grad_f(x)||.

(No direction vector is sought if grad_f(x)=0; such algorithms stop when reaching a stationary point.)

Other norms, such as ||d||^2 = d'Qd, where Q is symmetric and positive definite, lead to other directions that are steepest relative to that norm. In particular, if Q = H_f(x), this yields the modified Newton method.

Steiner problem. Find a subgraph of a graph, say G' = [V',E'], such that V' contains V* (a specified subset of nodes), and Sum{c(e): e in E'} is minimized. It is generally assumed c >= 0. When |V*|=2, this is the shortest path problem. When |V*|=|V|, this is the (minimum) spanning tree problem.

Step size. This is a scalar (s) in an algorithm of the form: x' = x + sd, where d is the direction vector. After d is chosen (nonzero), the step size is specified. One step size selection rule is line search; another is a fixed sequence, like s_k = 1/k.

Stochastic matrix. A non-negative matrix whose row sums are each 1. (A column stochastic matrix is one whose column sums are 1.) This arises in dynamic programs whose state transition is desribed by a stochastic matrix containing the probabilities of each transition.

Stochastic program. It is written in the form of a mathematical program extended to a parameter space whose values are random variables (generally with with known distribution function). This is converted to a standard form by forming a certainty equivalent.

    Here are some certainty equivalents:
  • Average value. Replace all random variables with their means.
  • Chance constraint. Given a stochastic program with a random variable, p, in its constraint: g(x; p) <= 0, a certainty equivalent is to replace this with the constraint, P[g(x; p) <= 0] >= a, where P[] is a (known) probability function on the range of g, and a is some acceptance level (a=1 means the constraint must hold for all values of p, except on a set of measure zero). Some models separate constraints with several levels:

    P[gi(x; p) <= 0 for all i in I_k] >= a_k   for k=1,...,K.

    The case of one chance constraint with the only random variable being the right-hand side is particularly simple. Suppose F is the cumulative distribution function of b for the chance constraint P[g(x) <= b] >= a. If b is a continuous random variable and F is continuous and strictly increasing, the chance constraint is equivalent to g(x) <= F^{-1}(1-a) (where F^{-1} is the inverse function of F). In particular, if g(x) = Ax, the program remains linear.
  • Recourse model. This assumes decisions are made over time where the effect of an early decision can be compensated by later decisions. The objective is to optimize the expected value. The 2-stage model has the form:

    Max f1(x1; p1) + f2(x2; p2): x1 in X1, x2 in X2, g(x1; p1) + g(x2; p2) <= 0.

    (Sums could be replaced by other operators.) Once x1 is implemented, p1 becomes known and x2 is chosen. The certainty equivalent is:

    Max E[f1(x1; p1) + F2(x1 | p1)]: x1 in X1, where

    F2(x1 | p1) = Sup{E[f2(x2; p2)]: x2 in X2(p2), g(x2; p2) <= -g(x1; p1)}

    for all p2 (except on set of measure zero).

    (E[] denotes the expected value.) The 'Sup' is used to define F2, the second stage value for a particular value of x1, because the choice of x1 might be infeasible. The nature of the recourse model is pessimistic: x must be chosen such that the original constraints hold no matter what the values of the random variables. With a finite number of possibilities, this means a system of constraints for each possible realization of p=(p1, p2). This extends recursively to a k-stage model.

    The linear 2-stage recourse model has the form:

    max E[c]x + E[F(x; p)]: Ax=b, x >= 0,


    F(x; p) = max d(p)y: W(p)y = w(p) - T(p)x,  y >= 0.

    Here the second stage variable is denoted y; it is determined after x has been set and the random variable p has been realized. The LP data depend on p as functions, d(p), W(p), w(p) and T(p). The fixed recourse model has W(p)=W. The complete recourse model assumes it fixed and {Wy: y >= 0} is all of R^m (where m = number of rows of W). This means that no matter what value of x is chosen for the first stage, there is feasible recourse (y). This is simple recourse if W=[I -I], so we can think of y as having two parts: ypos and yneg. The second stage LP simplifies to the following:

    max   dpos(p) ypos + dneg(p) yneg :   ypos, yneg >= 0,   ypos - yneg = w(p) - T(p)x.

Also see robust optimization. The certainty equivalent depends upon the underlying decision process. If it is adaptive, the recourse model applies (but RO might be more practical). The chance constraint model represents a notion of an allowed frequency of violations, as in environmental control models.

Strict interior. Let {x in X: g(x) <= b} be the level set of g. Then, its strict interior is {x in X: g(x) < b}. (This is not to be confused with the relative interior or the set interior - see Myth NLP-6.)

Strict optimum. Same as proper optimum.

Strictly complementary. Each complementary pair of variables must have exactly one zero (the other positive).

Strictly concave function. Negative is strictly convex.

Strictly convex function. A convex function that also satisfies the defining inequality strictly for distinct points, say x and y:

f(ax + (1-a)y) < af(x) + (1-a)f(y) for all a in (0,1).

Strictly quasiconcave function. Negative is strictly quasiconvex.

Strictly quasiconvex function. X is a convex set and f(ax + (1-a)y) < Max{f(x), f(y)} for all x, y in X for which f(x) not= f(y), and a is in (0, 1). (Note: f need not be quasiconvex – see Myth NLP-11.)

Strong duality. See dual

Strongly concave function. Negative is strongly convex.

Strongly convex function. Arises for f in C^2: eigenvalues of hessian are bounded away from zero (from below): there exists K > 0 such that h'H_f(x)h >= K||h||^2 for all h in R^n. For example, the function exp(-x) is strictly convex on R, but its second derivative is exp(-x), which is not bounded away from zero. The minimum is not achieved because the function approaches its infimum of zero without achieving it for any (finite) x. Strong convexity rules out such asymptotes.

Strongly quasiconcave function. Negative is strongly quasiconvex.

Strongly quasiconvex function (f on X). X is a convex set and f(ax + (1-a)y) < Max{f(x), f(y)} for all x, y in X, with x not= y, and a in (0, 1).

Structural variable. See linear program.

Subadditive function. f(x+y) <= f(x) + f(y) (where x, y in the domain implies x+y is in the domain).

Subdifferential (of f at x). Df(x) = {y: x is in argmax{vy - f(v): v in X}} (where D should be the partial derivative symbol, but such is an ASCII world). Also see conjugate function. If f is convex and differentiable with gradient, grad_f, Df(x) = {grad_f(x)}.

Example:   f(x) = | x |. Then, Df(0) = [-1, 1].

The subdiffenential is built on the concept of supporting hyperplane, generally used in convex analysis. When f is differentiable in a deleted neighborhood of x (but not necessarily at x), the B-subdifferential is the set of limit points:

DBf(x) = {d: there exists {xk} ––> x and {grad_f(xk)} ––> d}.

If f is continuously differentiable in a neighborhood of x (including x), DBf(x) = {grad_f(x)}. Otherwise, DBf(x) is generally not a convex set. For example, if f(x) = | x |, DBf(0) = {–1, 1}.

The Clarke subdifferential is the convex hull of DBf(x).

Subgradient. A member of the subdifferential.

Sublinear rate of convergence. See convergence.

Submodular function. Let N be a finite set and let f be a function on the subsets of N into R. Then, f is submodular if it satisfies:

Example: N={0,1}n (binary n-vectors) and f(S) = Sum{x in S} Sumj=1:n xj. Instance: n=2, S={(0,1), (1,1)}, T={(1,0),(1,1)}. Then, f(S\/T)=4, f(S)=3, f(T)=3, and f(S/\T)=2. (The linearity of Sum makes the inequality hold with inequality.)

Subspace. A subset of a vector space that is, itself, a vector space. An example is the null space of a matrix, as well as its orthogonal complement.

Substitution. See forward substitution and backward substitution.

Successive approximation. The iterative scheme by which an approximation is used for the basic design of an algorithm. The sequence generated is of the form x^(k+1) = x^k + A(x^k), where A is an algorithm map specified by its approximation to some underlying goal. Typically, this is used to find a fixed point, where A(x)=0 (e.g., seeking f(x)=x, let A(x) = f(x) - x, so the iterations are x^(k+1) = f(x^k), converging to x*=f(x*) if f satisfies certain conditions, such as a contraction map).

Sufficient matrix. Let A be an n×n matrix. Then, A is column sufficient if

[xi (Ax)i <= 0 for all i] [xi (Ax)i = 0 for all i].

A is row sufficient if its transpose is column sufficient. A is sufficient if it is both column and row sufficient. One example is when A is symmetric and
positive semi-definite. Here is an example of a matrix that is column sufficient, but not row sufficient:

This arises in linear complementarity problems.

Super-sparsity. The fraction of distinct nonzero values in a matrix. If a matrix has z nonzeros -i.e. z = sparsity × m × n, then the super-sparsity is number of distinct values divided by z. Although related to the idea of sparsity, a matrix can be highly super-sparse yet not sparse. For example if A(i,j) = 1 for all i,j, then sparsity is 0 but super-sparsity is 1/mn.

These matrices arise naturally in models with repetative substructures. For example, a process may be modeled as a linear program with activities that convert raw materials into finished products (like a refinery converting crude oil into petroleum products). The yields are determined by the engineering, so this model would be the same if embedded into a larger LP that uses it in numerous regions. Then, the yield values are replicated for each region, which makes the resulting LP matrix super-sparse. Additionally, any network problem in which the coefficient matrix is the incidence matrix is super-sparse. This leads to the special case of only zeros and ones.

Superadditive function. Negative is subadditive.

Superbasic variable. When using a method like the convex simplex method or projected gradient, the nonbasic variables are partitioned into those that are at one of their bound values and those that are not. The latter are called superbasic.

Superconsistent. Arises in geometric programming, and in semi-infinite programming as one that satisfies the Slater interiority condition.

Superlinear rate of convergence. See convergence.

Supermodular function. Negative is submodular.

Support set of x >= 0. {j: xj > 0}.

Supporting hyperplane of a set, S. A hyperplane that contains S in one of its closed halfspaces and intersects the closure of S with at least one point.

           ______________________  supporting hyperplane (line)
                   /  \
                  / S  \
Here is a graphic.

Suppose S is closed and convex. A key fact is that every supporting hyperplane contains an extreme point of S. If S is a polyhedron, the facets define a finite collection of supporting hyperplanes that completely determine the polyhedron (as the intersection of the associated halfspaces that contain S).

Supremum (abbr. Sup). This is the least upper bound on a (real-valued) function over (a subset of) its domain. If f is unbounded from above, Sup{f} = infinity, and if the domain is empty, Sup{f} = -infinity. Otherwise, suppose U is any upper bound: f(x) <= U for all x in X. U is a least upper bound if, for any e > 0, there exists x in the domain for which f(x) >= U-e. (That is, we can get arbitrarily close to U in the range of f.) Note that the supremum is always defined, and its range is in the extended reals. It is the maximum, if it is attained by some point in its domain.

Surplus variable. In an inequality constraint of the form g(x) >= b, the level of surplus is g(x)-b, which is designated by the surplus variable, s. Then, the original constraint is equivalent to the defining equation, g(x) - s = b, plus the non-negativity, s >= 0.

Surrogate constraint. An aggregation of the constraints, such as a (non-negative) linear sum of the (inequalities and) equations. Of special interest is the integer equivalent aggregation. (Also see the surrogate dual.)

Surrogate relaxation. Solving the surrogate dual problem. For given multipliers (u,v) (u >= 0), the surrogate dual objective is the relaxed mathematical program:

Max{f(x): x in X, ug(x) <= 0, vh(x) = 0}.

(More generally, the surrogate could be a nonlinear combination of the constraints.)

Symmetric dual. See it in the list of particular duals.

Symmetry exclusion. Excluding symmetric solutions, usually when solving combinatorial programs. For example, suppose xij = 1 if item i is assigned to be at position j (in 2 or 3 dimensional space); otherwise, xij=0. When branching on xij=1, the alternative optimal value x_ik=1 is being considered for position k that maps from j under reflection, translation, or rotation. So, when considering the complementary branch, xij=0, the symmetry exclusion condition, xik=0, is also added. Symmetry exclusion can sometimes be done by ordering variables, or with cutting planes. It arises in most problems with a geometric meaning, including the symmetric traveling salesman problem, where every tour, (1,2,...,n,1), has the symmetric solutions (2,3,...,n,1,2) and (1,n,n-1,...,2,1). The first of these symmetries is excluded by defining city 1 as the first (or "home") city. Fixing the home does not exclude the second symmetry, which is a tour reversal. Non-geometric problems can also have symmetries, such as graph coloring; any coloring has the symmetric solution by swapping colors – e.g., every red node becomes green, and every green node becomes red.



Send questions and comments to
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2014, Mathematical Programming Glossary, by the INFORMS Computing Society