
This version of the glossary is obsolete and is not maintained.
Mathematical Programming Glossary  S
f(x*, y) <= f(x*, y*) <= f(x, y*) for all x in X, y in Y.
von Neumann (1928) proved this equivalent to:
Inf{Sup{f(x, y): y in Y}: x in X} =
Sup{Inf{f(x, y): x in X}: y in Y} = f(x*, y*).
A sufficient condition for a saddle point to exist is that X and Y are nonempty, compact, convex sets, f(.,y) is convex on X for each y in Y, and f(x,.) is concave on Y for each x in X. Saddle point equivalence underlies duality. Satisfiability problem (SAT). Find a truth assignment to logical propositions such that a (given) collection of clauses is true (or ascertain that at least one clause must be false in every truth assignment). This fundamental problem in computational logic forms the foundation for NPcompleteness. When SAT is in CNF and may not be satisfiable, we define the problem of maximizing the number of true clauses as MAXSAT. Scaling. Changing the units of measurement, usually for the numerical stability of an algorithm. The variables are transformed as x' = Sx, where S=diag(s_{j}). The diagonal elements are the scale values, which are positive: s_{1}, ..., s_{n} > 0. Constraint function values can also be scaled. For example, in an LP, the constraints Ax=b, can be scaled by RAx=Rb, where R=diag(r_{i}) such that r > 0. (This affects the dual values.) Some LP scaling methods simply scale each column of A by dividing by its greatest magnitude (null columns are identified and removed).
Another method is logarithmic scaling, which scales by the logarithm of the greatest magnitude. More sophisitcated methods are algorithmic, taking both row and column extremes into account. Scatter search. This is a populationbased metaheuristic, starting with a collection of reference points, usually obtained by the application of some heuristic. A new point is created by taking combinations of points in the population, and rounding elements that must be integervalued. It bears some relation to a genetic algorithm, except that scatter search uses linear combinations of the population, while the GA ``crossover operation'' can be nonlinear. Scheduling (e.g., jobs). A schedule for a sequence of jobs, say j1,..., jn, is a specification of start times, say t1,...,tn, such that certain constraints are met. A schedule is sought that minimizes cost and/or some measure of time, like the overall project completion time (when the last job is finished) or the tardy time (amount by which the completion time exceeds a given deadline). There are precedence constraints, such as in the construction industry, where a wall cannot be erected until the foundation is laid. There is a variety of scheduling heuristics. Two of these for scheduling jobs on machines are list heuristics: the Shortest Processing Time (SPT), and the Longest Processing Time (LPT). These rules put jobs on the list in nondecreasing and nonincreasing order of processing time, respectively. Other scheduling problems, which might not involve sequencing jobs, arise in production planning. Search tree. The tree formed by a branch and bound algorithm strategy. It is a tree because at each (forward) branching step the problem is partitioned into a disjunction. A common one is to dichotomize the value of some variable, x <=v or x >=v+1. This creates two nodes from the parent: [parent node] / \ x <= v / \ x >= v+1 / \ [left child] [right child] Secant method. A method to find a root of a univariate function, say F. The iterate is F(x^k) [x^k  x^(k1)] x^(k+1) = x^k  . F(x^k)  F(x^(k1))If F is in C^2 and F''(x) not= 0, the order of convergence is the golden mean, say g (approx.= 1.618), and the limiting ratio is:  2 F'(x) (g1)     F''(x)  Secondorder conditions. This descends from classical optimization, using the secondorder term in Taylor's expansion. For unconstrained optimization, the secondorder necessary condition (for f in C^2) is that the hessian is negative semidefinite (for a max). Secondorder sufficient conditions are the firstorder conditions plus the hessian be negative definite. For constrained optimization, the secondorder conditions are similar, using projection for a regular mathematical program and the Lagrange Multiplier Rule. They are as follows (all functions are in C^2, and the mathematical program is in standard form, for x* a local maximum):
Self concordance. Properties of a function that yield nice performance of Newton's method used for line search when optimizing a barrier function. Specifically, let B be a barrier function for S = {x in X: g(x) <= 0} with strict interior S^0. Let x be in S and let d be a direction vector in R^n such that the line segment [xtd, x+td] is in S for t in [0, t*], where t* > 0. Then, define F:[0,t*]–>R by:
F(t) = B(x + td)
(while noting that F depends on x and d).
The function F is selfconcordant if it is convex in C^{3}
and satisfies the following for all x and d:
F'''(0) <= 2F''(0)^{3/2}.
One calls F kselfconcordant in an open convex domain if
F'''(0) <= 2k F''(0)^{3/2}.
The logarithmic barrier function, associated with linear programming,
is selfconcordant with k=1.
This further extends naturally to functions in R^{n}.
Semiassignment problem. This is like the assignment problem, except only the rows or the columns (not both) of the assignment matrix is constrained to equal 1. With linear objective, the problem is: Semidefinite program. Min{cx: S(x) in P}, where P is the class of positive semidefinite matrices, and S(x) = S_0 + Sum_j{x(j)S_j}, where each S_j, for j=0,...,n is a (given) symmetric matrix. This includes the linear program as a special case. Semiinfinite program. A mathematical program with a finite number of variables or constraints, but an infinite number of constraints or variables, respectively. The randomized program is a semiinfinite program because it has an infinite number of variables when X is not finite. Under suitable assumptions, the multipliers in the Lagrange Multiplier Rule provide derivatives of the optimal response function – i.e., under certain conditions, (u, v) = grad_f*(b, c). For special approaches in LP, see compatibility theory, the 100% Rule, and the tolerance approach. Let {x^k} be a specified set of points, where x^k = [x(k,1), x(k,2), ..., x(k,n)], and let y={y(k,j)} be decision variables that are the coefficients of convex combinations, giving the following linear program:
A restricted basis entry rule is invoked during the simplex method to yield an approximate solution. (However, this is dominated by the Generalized Lagrange Multiplier method, which can be viewed as generating the approximating breakpoints posteriori, getting successively finer near the solution.) The delta form uses the differences: u(k, j) = x(k, j)  x(k1, j). The associated functional differences are:
Then, the approximating LP is:
Max Sum_kj{Df(k, j) u(k, j)}: 0 <= u <= 1;
where b = Sum_j{g_j(x(0, j))} and c = Sum_j{h_j(x(0, j))} (a similar
constant was dropped from the objective). Another restricted
basis rule is invoked: u(k,j) > 0 implies u(k,q)=1 for all q < j
and all k.
Sum_kj{Dg(k, j) u(k, j)} <= b, Sum_kj{Dh(k, j) u(k, j)} = c, Separating hyperplane. A hyperplane for which two (given) sets lie in opposite halfspaces. The separation is strict if the two sets are contained in their respective open halfspace. ____  ____    \ /  S1   /S2\ ____  \__/  separating hyperplane (line)Here is a graphic. Sequencing problems. Finding an ordering, or permutation, of a finite collection of objects, like jobs, that satisfies certain conditions, such as precedence constraints. Sequential decision process. See dynamic programming and Timestaged models. Sequential Linear Programming (SLP). Solving a nonlinear program by a sequence of linear approximations and using linear programming to solve each one. The linear approximations are usually done by using the firstorder Taylor expansion. Sequential Quadratic Programming (SQP). Solving a nonlinear program by a sequence of quadratic approximations and using quadratic programming to solve each one. The approximations are usually done by using the secondorder Taylor expansion. Sequential Unconstrained Minimization Technique (SUMT). This is the penalty function approach. Set covering problem. See covering problem. Shadow price. An economic term to denote the rate at which the optimal value changes with respect to a change in some righthand side that represents a resource supply or demand requirement. (This is sometimes taken as synonymous with the dual price, but this can be erroneous, as in the presence of degeneracy – see marginal price.) Also see pricing. ShermanMorrison formula. This is a useful identity: A^{–1} ab' A^{–1} [A + ab']^{–1} = A^{–1} – , 1 + b'A^{–1}awhere A is a nonsingular n×n matrix and a and b are nvectors, provided the inverse (on the left) exists. Shortest path. In a graph or network, this is a path from one node to another whose total cost is the least among all such paths. The "cost" is usually the sum of the arc costs, but it could be another function (e.g., the product for a reliability problem, or max for a fuzzy measure of risk). There are some particular labeling algorithms given. Signomial. The difference between two posynomials. This class of function defines the general geometric program. More generally, some authors define a simplex to be the convex hull of any n+1 affinely independent vectors, and refer to the special case of the unit vectors as the standard simplex. Simplex method. An algorithm invented to solve a linear program by progressing from one extreme point of the feasible polyhedron to an adjacent one. The method is an algorithm strategy, where some of the tactics include pricing and pivot selection. The elementary simplex method is the name of Dantzig's original (1947) algorithm, with the following rules applied to the standard form: Min{cx: Ax=b, x >= 0}.
Among the variations are:
The revised simplex method is the use of a particular factored form of the basis: B = [E_1 E_2 ... E_k] (after k iterations), where each E_{i} is an elementary matrix. Then, the revised simplex method uses forward transformation to pivot and backward transformation to update the pricing vector. (For this distinction, the elementary simplex method is sometimes called the tableau method.) The simplex method draws its name from imagining a normalization constraint, Sum_{j}{x_{j}} = 1, and thinking of the jth column of A to be selected by the weight x_{j} in S_w. Then, at an iteration, an msimplex is specified by the basic variables, and an adjacent simplex is chosen to improve the objective value. This view is in requirements space. (For a different algorithm, used as a heuristic in NLP, see the NelderMead simplex method.) Simplex multiplier. See pricing. Simplicial subdivision. Given a simplex, S, its simplicial subdivision is a collection of simplices, say {T_{i}} such that \/_{i}{T_i} = S and for any i and j, either the intersection T_i/\T_{j} is empty or equals the closure of a common face. The mesh of the subdivision is the diameter of the largest subsimplex. This arises in a fixed point approach to compute an economic equilibrium. Simulated annealing. An algorithm for solving hard problems, notably combinatorial programs, based on the metaphor of how annealing works: reach a minimum energy state upon cooling a substance, but not too quickly in order to avoid reaching an undesirable final state. As a heuristic search, it allows a nonimproving move to a neighbor with a probability that decreases over time. The rate of this decrease is determined by the cooling schedule, often just a parameter used in an exponential decay (in keeping with the thermodynamic metaphor). With some (mild) assumptions about the cooling schedule, this will converge in probability to a global optimum. Skew symmetric matrix (A). A is square and A' = A. Slack variable. In an inequality constraint of the form g(x) <= b, the slack is bg(x), which is designated by the slack variable, s. Then, the original constraint is equivalent to the defining equation, g(x) + s = b, plus s >= 0. Slater's (interiority) condition. Originally for the purely inequality system with g convex, it means there exists x for which g(x) < 0. More generally, for a mathematical program in standard form, it means there exists x in X for which g(x) < 0 and h(x) = 0. Spanning tree [problem]. A subgraph that is a tree containing all nodes. The max weight spanning tree problem is to find a spanning tree such that the sum of (given, positive) weights of the edges is a maximum.
Sparsity. The fraction of zeroes in a matrix. If A is m by n, and A(i, j) ≠ 0 for k of its elements, its sparsity is k/(mn). Large linear programs tend to be very sparse, increasingly so as the dimensions grow. For example, consider the standard transportation problem with s sources and d destinations. This has m = (s+d) constraints and n = sd variables. Each column, however, has exactly 2 nonzeroes since A is the incidence matrix of the network, so its sparsity is 2n/(mn) = 2/m, which decreases as the number of sources and/or destinations grows large. The sparsity of a simple graph (or network) is the sparsity of its adjacency matrix. More generally, the sparsity of a multigraph refers to the average degree of its nodes. See super sparsity for extensions of this idea. Specially ordered set (SOS). These are sets of nonnegative variables that are required to sum to 1. For computational efficiency, it is sometimes better to define these sets by some marking data structure, rather than include them along with other equality constraints. There are two types of SOSs, distinguished by what they represent. A Type 1 SOS is when each variable is binary, so the constraint is one of multiple choice. A Type 2 SOS is when a restricted basis entry rule is used, as in the lambdaform of separable programming. Spectral radius of a matrix, A. The radius of a disk that contains the spectrum: r(A) = Max{y: y is an eigenvalue of A}. Spectrum of a matrix, A. Its set of eigenvalues. Speed of convergence. See convergence. Stability region. The set of parameter values for which an optimal solution remains optimal. This arises naturally when solving combinatorial programs, where a solution is often a subgraph, such as a tree, and the question is for what range of arc weights is this subgraph optimal (such as a spanning tree that is minimum for given weights). More generally, x could be a solution generated by some algorithm, A, from an initial value x^{0}. Then, suppose the feasibility region depends on the parameter p, say F(p), and the objective also depends on p, say f(x;p). Let X(p,A,x^{0}) denote the generated solution from algorithm A, starting at x^{0}, with parameter value p. Let the parameter set be P (which includes p*). The stability region of x* = X(p*,A,x^{0}) is {p in P: x* = X(p,A,x^{0})}. The algorithm may be a heuristic, so x* need not be optimal. For example, one could use an nOpt heuristic for the travelling salesman problem, so x represents a tour. The parameters could be the costs, or they could be the location of each point in a Euclidean TSP. The stability region is the set of costs, or coordinates in the plane, for which the tour generated by nOpt is the same. The first stability condition pertains to upper semicontinuity and the second, called the closure condition, pertains to lower semicontinuity.
Stable set. Same as independent set Standard linearization. The standard way to linearize the product of two binary variables x and y is to replace xy with a continuous variable w and add four linear inequalities as auxiliary constraints: w <= x, w <= y, w >= x + y  1, and w >= 0.Collectively, these imply w = xy for all binary values of x and y. This can be generalized for the product of binary variables x_j for all j in some index set J by replacing prod_{j in J} x_j with a continuous variable w and adding J + 2 auxiliary constraints: w <= x_j for all j in J, w >= sum_{j in J} x_j  (J  1), and w >= 0. Stationary point. Usually this is used to mean a KuhnTucker point, which specializes to one for which grad_f(x)=0 if the mathematical program is unconstrained. In the context of an algorithm, it is a fixed point. Stationary policy. In a dynamic program, this is a policy that is independent of time – i.e., x*(s,t) = T(s) (some function of state, but not of time, t). Steel beam assortment problem. A steel corporation manufactures structured beams of a standard length, but a variety of strengths. There is a known demand of each type of strength, but a stronger one may fulfill demand (or part thereof) for another beam (but not conversely). The manufacture of each type of steel beam involves a fixed charge for its setup. In addition, there is a shipping cost proportional to the difference in the demanded strength and the actual strength, and proportional to the quantity shipped.
Although t does not index time, the mathematical program for this problem is the same form as the production scheduling problem, using the inventory balance equations to relate y and x. This is valid because s_1 >= s_2 >= ... >= s_N implies y(t) can be used to fulfill demand D(t+1) + D(t+2) + ... + D(N). (Also, note that here h_t is not a "holding cost".) Steepest ascent (descent, if minimizing). This is a class of algorithms, where x' = x + sd, such that the direction vector d is chosen by maximizing the initial "velocity" of change, and the step size (s) is chosen by line search. Generally used in the context of unconstrained optimization, the mathematical program is Max{f(x): x in R^n}, where f is in C^1. (For descent, change Max to Min.) Then, d is chosen to maximize the firstorder Taylor approximation, subject to a normalization constraint: Max{grad_f(x)d: d=1}, where d denotes the norm of the direction vector, d. When the Euclidean norm is used, this yields the original steepest ascent algorithm by Cauchy, which moves in the direction of the gradient:
d = grad_f(x) / grad_f(x).
(No direction vector is sought if grad_f(x)=0; such algorithms stop
when reaching a stationary point.)
Other norms, such as d^2 = d'Qd, where Q is symmetric and positive definite, lead to other directions that are steepest relative to that norm. In particular, if Q = H_f(x), this yields the modified Newton method. Steiner problem. Find a subgraph of a graph, say G' = [V',E'], such that V' contains V* (a specified subset of nodes), and Sum{c(e): e in E'} is minimized. It is generally assumed c >= 0. When V*=2, this is the shortest path problem. When V*=V, this is the (minimum) spanning tree problem. Step size. This is a scalar (s) in an algorithm of the form: x' = x + sd, where d is the direction vector. After d is chosen (nonzero), the step size is specified. One step size selection rule is line search; another is a fixed sequence, like s_k = 1/k. Stochastic matrix. A nonnegative matrix whose row sums are each 1. (A column stochastic matrix is one whose column sums are 1.) This arises in dynamic programs whose state transition is desribed by a stochastic matrix containing the probabilities of each transition. Stochastic program. It is written in the form of a mathematical program extended to a parameter space whose values are random variables (generally with with known distribution function). This is converted to a standard form by forming a certainty equivalent.
Also see robust optimization. The certainty equivalent depends upon the underlying decision process. If it is adaptive, the recourse model applies (but RO might be more practical). The chance constraint model represents a notion of an allowed frequency of violations, as in environmental control models. Strict interior. Let {x in X: g(x) <= b} be the level set of g. Then, its strict interior is {x in X: g(x) < b}. (This is not to be confused with the relative interior or the set interior  see Myth NLP6.) Strict optimum. Same as proper optimum. Strictly complementary. Each complementary pair of variables must have exactly one zero (the other positive). Strictly concave function. Negative is strictly convex. Strictly convex function. A convex function that also satisfies the defining inequality strictly for distinct points, say x and y:
f(ax + (1a)y) < af(x) + (1a)f(y) for all a in (0,1).
Strictly quasiconcave function. Negative is strictly quasiconvex. Strictly quasiconvex function. X is a convex set and f(ax + (1a)y) < Max{f(x), f(y)} for all x, y in X for which f(x) not= f(y), and a is in (0, 1). (Note: f need not be quasiconvex – see Myth NLP11.) Strongly concave function. Negative is strongly convex. Strongly convex function. Arises for f in C^2: eigenvalues of hessian are bounded away from zero (from below): there exists K > 0 such that h'H_f(x)h >= Kh^2 for all h in R^n. For example, the function exp(x) is strictly convex on R, but its second derivative is exp(x), which is not bounded away from zero. The minimum is not achieved because the function approaches its infimum of zero without achieving it for any (finite) x. Strong convexity rules out such asymptotes. Strongly quasiconcave function. Negative is strongly quasiconvex. Strongly quasiconvex function (f on X). X is a convex set and f(ax + (1a)y) < Max{f(x), f(y)} for all x, y in X, with x not= y, and a in (0, 1). Structural variable. See linear program. Subadditive function. f(x+y) <= f(x) + f(y) (where x, y in the domain implies x+y is in the domain). Subdifferential (of f at x). Df(x) = {y: x is in argmax{vy  f(v): v in X}} (where D should be the partial derivative symbol, but such is an ASCII world). Also see conjugate function. If f is convex and differentiable with gradient, grad_f, Df(x) = {grad_f(x)}. Example: f(x) =  x . Then, Df(0) = [1, 1]. The subdiffenential is built on the concept of supporting hyperplane, generally used in convex analysis. When f is differentiable in a deleted neighborhood of x (but not necessarily at x), the Bsubdifferential is the set of limit points:D_{B}f(x) = {d: there exists {x^{k}} ––> x and {grad_f(x^{k})} ––> d}. If f is continuously differentiable in a neighborhood of x (including x), D_{B}f(x) = {grad_f(x)}. Otherwise, D_{B}f(x) is generally not a convex set. For example, if f(x) =  x , D_{B}f(0) = {–1, 1}.The Clarke subdifferential is the convex hull of D_{B}f(x). Subgradient. A member of the subdifferential. Sublinear rate of convergence. See convergence. Example: N={0,1}^{n} (binary nvectors) and f(S) = Sum_{{x in S}} Sum_{j=1:n} x_{j}. Instance: n=2, S={(0,1), (1,1)}, T={(1,0),(1,1)}. Then, f(S\/T)=4, f(S)=3, f(T)=3, and f(S/\T)=2. (The linearity of Sum makes the inequality hold with inequality.)Subspace. A subset of a vector space that is, itself, a vector space. An example is the null space of a matrix, as well as its orthogonal complement. Substitution. See forward substitution and backward substitution. Successive approximation. The iterative scheme by which an approximation is used for the basic design of an algorithm. The sequence generated is of the form x^(k+1) = x^k + A(x^k), where A is an algorithm map specified by its approximation to some underlying goal. Typically, this is used to find a fixed point, where A(x)=0 (e.g., seeking f(x)=x, let A(x) = f(x)  x, so the iterations are x^(k+1) = f(x^k), converging to x*=f(x*) if f satisfies certain conditions, such as a contraction map).
Sufficient matrix. Let A be an n×n matrix. Then, A is column sufficient if
[x_{i} (Ax)_{i} <= 0 for all i]
[x_{i} (Ax)_{i} = 0 for all i].
A is row sufficient if its transpose is column sufficient.
A is sufficient if it is both column and row sufficient.
One example is when A is symmetric and
positive semidefinite.
Here is an example of a matrix that is column sufficient, but not row
sufficient:
Supersparsity. The fraction of distinct nonzero values in a matrix. If a matrix has z nonzeros i.e. z = sparsity × m × n, then the supersparsity is number of distinct values divided by z. Although related to the idea of sparsity, a matrix can be highly supersparse yet not sparse. For example if A(i,j) = 1 for all i,j, then sparsity is 0 but supersparsity is 1/mn. These matrices arise naturally in models with repetative substructures. For example, a process may be modeled as a linear program with activities that convert raw materials into finished products (like a refinery converting crude oil into petroleum products). The yields are determined by the engineering, so this model would be the same if embedded into a larger LP that uses it in numerous regions. Then, the yield values are replicated for each region, which makes the resulting LP matrix supersparse. Additionally, any network problem in which the coefficient matrix is the incidence matrix is supersparse. This leads to the special case of only zeros and ones. Superadditive function. Negative is subadditive. Superbasic variable. When using a method like the convex simplex method or projected gradient, the nonbasic variables are partitioned into those that are at one of their bound values and those that are not. The latter are called superbasic. Superconsistent. Arises in geometric programming, and in semiinfinite programming as one that satisfies the Slater interiority condition. Superlinear rate of convergence. See convergence. Supermodular function. Negative is submodular. Support set of x >= 0. {j: x_{j} > 0}. Supporting hyperplane of a set, S. A hyperplane that contains S in one of its closed halfspaces and intersects the closure of S with at least one point. ______________________ supporting hyperplane (line) /\ / \ / S \ /______\Here is a graphic. Suppose S is closed and convex. A key fact is that every supporting hyperplane contains an extreme point of S. If S is a polyhedron, the facets define a finite collection of supporting hyperplanes that completely determine the polyhedron (as the intersection of the associated halfspaces that contain S). Supremum (abbr. Sup). This is the least upper bound on a (realvalued) function over (a subset of) its domain. If f is unbounded from above, Sup{f} = infinity, and if the domain is empty, Sup{f} = infinity. Otherwise, suppose U is any upper bound: f(x) <= U for all x in X. U is a least upper bound if, for any e > 0, there exists x in the domain for which f(x) >= Ue. (That is, we can get arbitrarily close to U in the range of f.) Note that the supremum is always defined, and its range is in the extended reals. It is the maximum, if it is attained by some point in its domain. Surrogate constraint. An aggregation of the constraints, such as a (nonnegative) linear sum of the (inequalities and) equations. Of special interest is the integer equivalent aggregation. (Also see the surrogate dual.) Surrogate relaxation. Solving the surrogate dual problem. For given multipliers (u,v) (u >= 0), the surrogate dual objective is the relaxed mathematical program:
Max{f(x): x in X, ug(x) <= 0, vh(x) = 0}.
(More generally, the surrogate could be a nonlinear combination of the
constraints.)
Symmetric dual. See it in the list of particular duals. Symmetry exclusion. Excluding symmetric solutions, usually when solving combinatorial programs. For example, suppose x_{i}j = 1 if item i is assigned to be at position j (in 2 or 3 dimensional space); otherwise, x_{i}j=0. When branching on x_{i}j=1, the alternative optimal value x_ik=1 is being considered for position k that maps from j under reflection, translation, or rotation. So, when considering the complementary branch, x_{i}j=0, the symmetry exclusion condition, x_{i}k=0, is also added. Symmetry exclusion can sometimes be done by ordering variables, or with cutting planes. It arises in most problems with a geometric meaning, including the symmetric traveling salesman problem, where every tour, (1,2,...,n,1), has the symmetric solutions (2,3,...,n,1,2) and (1,n,n1,...,2,1). The first of these symmetries is excluded by defining city 1 as the first (or "home") city. Fixing the home does not exclude the second symmetry, which is a tour reversal. Nongeometric problems can also have symmetries, such as graph coloring; any coloring has the symmetric solution by swapping colors – e.g., every red node becomes green, and every green node becomes red.
Send questions and comments to icsMPGlossary@mail.informs.org. View the INFORMS Computing Society's Editorial Board Copyright^{©} 1996 – 2014, Mathematical Programming Glossary, by the INFORMS Computing Society 