Mathematical Programming Glossary - N
Mathematical Programming Glossary - N

Near optimal. A point x that is within a small value, say e > 0, of optimality in some sense. The following are the common measures of nearness:

  • in value: x feasible and f(x) >= f(x*) - e.
  • in policy: ||x - x*|| <= e.
  • in resource level: x in argmax{f(y): y in X, g(y) <= b, h(x) = c}, where ||(b, c)|| <= e.

Nearest neighbor algorithm. A type of greedy algorithm for combinatorial programs where there is measure of nearness between neighbors. An example is the traveling salesman problem, where the procedure is to choose the next city to be one that is nearest the current city in the sequence.

Negative definite matrix (A). x'Ax < 0 for all nonzero x.

Negative semi-definite matrix (A). x'Ax <= 0 for all x.

Neighborhood. For a normed space, the neighborhood of a point, x, is the open ball centered at x: {y: ||y-x|| < e}, where e > 0. The closed ball, with ||y-x|| <= e, is also a neighborhood. The usual notation (in this glossary) is N_e(x), and context dictates whether it is open or closed. This extends to the neighborhood of a set S by taking the union; equivalently, N_e[S] = {y: ||y-x|| <[=] e for some x in S}.

In integer programming, the neighborhood could mean integer-valued neighbors of the form |x-x*| <= 1. In a combinatorial program, where variables are binary, integer-valued neighbors comprise all members of {0,1}^n. In this case the neighborhood is defined relative to some subset of binary variables reachable by some operation that depends on the problem. This is what is generally meant by a neighborhood in heuristic search in general, and simulated annealing or tabu search in particular, where a move is defined in terms of neighbors. In that context, a neighborhood could be as simple as complementing the value of one variable, as deletions or additions in a knapsack problem, or it could be more complex, like a n-Opt neighbor of a TSP tour.

Nelder-Mead simplex method. This is a heuristic search (not guaranteed to find a solution) for unconstrained optimization. Let {x^0, x^1, ..., x^n} be n+1 points in R^n that define a simplex. Define the extreme values: f(x^u) = max{f(x^0), ..., f(x^n)} and f(x^l) = min{f(x^0), ..., f(x^n)}. Seeking a maximum of f in R^n, the idea is to replace x^l with a point having a better objective value.

    Here is an iteration:
  1. Define the centroid of the simplex without this point of least value: c = Sum_i{x^i: i not= l}/n.
  2. reflection: compute r = c + a(c - x^l), where a > 0 (a is called the "reflection coefficient").
  3. expansion: If f(r) > f(x^l) (i.e., we have a better point), compute x = c + b(c - x^l), where b > 1 (b is called the "expansion coefficient"). If f(x) > f(r), x replaces x^l, and the iteration is complete. Otherwise, r replaces x^l, and the iteration is complete.
  4. At this step, the reflection point (r) is not better than the worst of the simplex vertices (i.e., f(r) <= f(x^l)). This is divided into the following cases.
    1. If min{f(x^i): i not= l} <= f(r) <= f(x^u), replace x^l with r.
    2. If min{f(x^i): i not= l} > f(r), define x* as the better of the two: f(x*) = max{f(x^l), f(r)} (with x* = x^l or r, resp.). Then, take a contraction step: x = c + d(x* - c), where 0 < d < 1 (d is called the "contraction coefficient"). If f(x) > f(x*), x replaces x^l, and the iteration is complete. Otherwise, the last resort is to replace all x^i with: x'^i = (x^i + x^u)/2.

Network. A collection of nodes, V, sometimes called vertices, plus a collection of arcs, A, which are directed from one node to another. The two sets form a network, denoted N=[V,A]. As such, it can be considered a directed graph (see other terms, like special graphs).

    Here are associated functions and data values:
  • tail of k-th arc (i, j) is node i; we sometimes write T(k).
  • head of k-th arc (i, j) is node j; we sometimes write H(k).
  • in-degree of node i is |{k: H(k)=i}|.
  • out-degree of node i is |{k: T(k)=i}|.
  • arc capacity limits the total flow across the arc at any one time.
  • node capacity limits the total flow through a node at any one time.
  • supply or demand at a node provides external input or an output requirement.

Network flows. This is an assignment of arc values, called flows, say f(k) for the k-th arc, that satisfy two types of constraints: (1) arc bounds, L <= f <= U, and (2) node balances, Flow out of node i - Flow into node i = b(i). The flow out of node i can be expressed as Sum_k{f(k): T(k)=i}, and the flow into node i can be expressed as Sum_k{f(k): H(k)=i}.

If b(i) < 0, -b(i) is a supply at node i (called a supply node); if b(i) > 0, b(i) is a demand at node i (called a demand node). If b(i)=0, node i is simply a transshipment node, and the balance equation says that the flow into node i must equal the flow out of node i. Another way to express the node flow balance equations is with the node-arc incidence matrix: Mf=b.

Still another representation is to define flow variables, x(i, j) on V×V, which describes how much flow goes from node i to node j. This can be used as long as there are no parallel arcs – i.e., no two arcs have both the same tail and the same head. (In some applications, parallel arcs are needed, such as to increase capacity across a pair of arcs with an increased cost.) In this form, the capacity constraints are still of the form L <= x <= U, but the node equations have a different form:

Sum_j{x(i, j): (i, j) in A} - Sum_j{x(j, i): (j, i) in A} = b(i) for all i in V.

The (linear) min cost network flow problem is to minimize total cost, Sum_ij{c(i,j)x(i,j): (i,j) in A}, where c is a unit cost of flow, subject to the flow bounds and balance equations.

Neural network (also called artificial neural network, abbr. ANN). A network where the nodes correspond to neurons and the arcs correspond to synaptic connections in the biological metaphor. The following properties are included:

neural state. Each node has a state variable, say x. In the brain, this could be the potassium level; in computing applications, it could be anything the modeler chooses.

arc weight. Each arc has a weight that affects the state of its neighboring nodes when firing. If the weight is positive, it said to be excitatory; if it is negative, it is inhibitory.

state equations. The neural states change by some differential (or difference) equation, say x' = F(x,w,t). Typically (but not necessarily), -F is the gradient of an energy function (in keeping with the biological metaphor), say F(x,w,t) = -grad_x[E(x,w,t)], so that x(t) follows a path of steepest descent towards a minimum energy state.

learning mechanism. This could be equations to change the weights: w' = L(x,w,t). Various learning mechanisms are represented this way, including a form of supervised learning that uses a training set to provide feedback on errors. Other elements can be learned besides the arc weights, including the topology of the network.

Newsboy problem. A newspaper is concerned with controlling the number of papers to be distributed to newstands. The cost of a paper varies (i.e., Sunday vs. daily), and the demand is a random variable, q, with probability function P(q). Unsold papers are returned, with no salvage value the next day, losing millions of dollars in the production cost. The demand for newspapers is a random variable, with probability function P(q) = probability that demand equals q. It is possible, however, for a newstand to order more papers the same day. There are holding and shortage (penalty) costs. The problem is to determine a reorder policy so as to minimize total expected cost. This problem was used to consider a reorder policy with a 2-parameter decision rule:

  1. s = inventory level at which an order is placed;
  2. S = inventory level to which to order.

Then, the decision rule to be employed is the following policy:

Order nothing if the inventory of papers is >= s;
Order S-s if there are s papers on hand and s < S.

The significance of this problem is that it was used to introduce the notion (and optimality) of an (s, S) policy in inventory theory.

Newton's method (also called Newton-Raphson method). This is the iterative scheme to find a zero of a function:

where F:R^n–>R^n (F is in C^1) and J(x) is the jacobian of F (assumed nonsingular). In mathematical programming this is used to find a stationary point, where F=grad_f and J=H_f. Lacking global convergence, this leads to the modified Newton method, sometimes called the damped Newton method. (See the associated myth, Myth NLP-13 to avoid misconception.)

No-free-lunch theorem. In heuristic search, this is the proposition that all methods perform the same when averaged over all possible objective functions. The idea is that a particular search algorithm, like simulated annealing, may be designed to perform especially well for some functions, compared to a genetic algorithm, but when applied to a representative sample of all possible costs, they will perform exactly the same, on the average. This implies that to do especially well on some problems, the search method must do worse on others; hence, the "no-free-lunch" description.

Node packing problem. See Packing problem.

Nonbasic. A variable or column that is not basic.

Nondegeneracy. A nondegenerate solution is one that is not degenerate. A nondegeneracy assumption is one that ensures every optimal solution is not degenerate. In general, a nondegenerate solution is strictly complementary. In linear programming, nondegeneracy is equivalent to uniqueness in both the primal and the dual.

Nonlinear program (NLP). At least one of the functions (objective or constraint) is not affine on X. (Usually, it is the rule of the function that classifies it as nonlinear. In particular, a linear integer program is not generally considered an NLP.)

n-Opt. This is a heuristic for the TSP, though it can apply more generally to a problem that seeks a subgraph with all nodes. It is a local search, which considers a move by removing n edges from a trial solution, then replacing them with some other n edges. In TSP, the 2-Opt neighbors of the tour (1,2,...,n) are pair-wise exchanges. The set of exchanges, however, is larger than the set of 2-Opt neighbors. For example, consider n=4, so the original tour is (1,2,3,4). The six pair-wise exchanges of this permutation are:

      (2,1,3,4):  (1)*--(2)          (3,2,1,4):  (1)*--(2)
                    \  *                          |     *
                     \/                           |     |
                     /\                           |     |
                    /  *                          *     |
                  (4)*--(3)                      (4)--*(3)

      (4,2,3,1):  (1)   (2)          (1,3,2,4):  (1)   (2)
                   |*  * |                        |*  * |
                   | \/  |                        | \/  |
                   | /\  |                        | /\  |
                   */  \ *                        */  \ *
                  (4)   (3)                      (4)   (3)

      (1,4,3,2):  (1)*--(2)          (1,2,4,3):  (1)--*(2)
                   |     *                         *  /  
                   |     |                          \/   
                   |     |                          /\   
                   *     |                         *  \  
                  (4)--*(3)                      (4)--*(3) 
We see duplicates due to the equivalence of any cyclic permutation. Also, there are only two 2-Opt moves if the TSP is symmetric:
                  (1)---(2)                      (1)   (2)
                    \  /                          |\  / |
                     \/                           | \/  |
                     /\                           | /\  |
                    /  \                          |/  \ |
                  (4)---(3)                      (4)   (3) 
For n=2, the replacement is unique – that is, once two edges are chosen for removal, there is only one replacement. This is not true for n > 2, and part of the heuristic is to decide what replacement to make.

Norm. This is a function of a vector, say ||x||, that satisfies three properties:

  1. Homogeneous: ||tx|| = |t| ||x|| for all (scalars), t.
  2. Positive: ||x|| > 0 for x not= 0. (Note: ||0|| = 0 by homogeneity, so 0 is the unique vector with zero norm.)
  3. Subadditive: ||x + y|| <= ||x|| + ||y||.

Norms that arise frequently in mathematical programming are:

Euclidean norm (on R^n): ||x|| = sqrt(Sum_j{x_j^2})

L_inf (on R^n): ||x|| = max_j{|x_j|} (= lim L_p as p-->inf)

L_p (on R^n, for p >= 1): ||x|| = [Sum_j{|x_j|^p}]^(1/p)

Matrix norm (induced by vector norm): ||A|| = Max{||Ax||: ||x||=1}

Supremum norm (on function space): ||F|| = sup{|F(x)|: x in X}

Normal cone. For a convex set S, the normal cone to S at x, for x in S, is {y : <v - x, y> <= 0 for all v in S}, where < * > is an inner product.

Northwest corner rule. This constructs a feasible shipment for the transportation problem, from the cost table (c_ij = unit cost from source i to destination j), as follows. Start at the NW corner (source 1, destination 1), and allocate the most possible: the min of the supply and demand. If supply exceeds the demand, proceed to the next destination, and continue until all of supply 1 is allocated. Then, go to source 2 and repeat the allocation process, starting with the first (lowest index) destination whose demand has not been fulfilled. If demand exceeds the supply, proceed to the next source, and continue until all of demand 1 is allocated. Then, go to destination 2 and repeat the allocation process. Eventually, all demand must be fulfilled if the problem is feasible (i.e., total supply >= total demand).

NP-complete. Problems are divided into two categories: those for which there exists an algorithm to solve it with polynomial time complexity, and those for which there is no such algorithm. We denote the former class of problems by P. There are problems for which no known algorithm exists that solves it in polynomial time, but there is also no proof that no such algorithm exists. Among these problems that are not known to be in P (or in ~P), there is a subclass of problems known as NP-complete: those for which either all are solvable in polynomial time, or none are. Formally, a problem is NP if there exists an algorithm with polynomial time complexity that can certify a solution. For example, it is not known whether there exists a polynomial algorithm to solve a system of Diophantine equations, Ax=b for x in Z^n (integer n-vectors). However, we can certify any trial x in polynomial time, just by checking that it is in Z^n, then multiplying by A to compare with b. A problem, p, is NP-complete if it is NP and for any problem in NP, there exists a polynomial time algorithm to reduce it to p. A fundamental member of the NP-complete class is the satisfiability problem. It is an open question whether P=NP, and most regard the NP-complete problems as having exponential time complexity. Further details are in the supplement, Introduction to Computational Complexity.

NP-hard. An optimization problem that relies upon the solution of an NP-complete problem. In that sense, NP-hard problems are at least as hard as NP-complete problems.

Null space of matrix A. {x: Ax=0}. It is the orthogonal complement of its row space, {x: x=yA for some y in R^m}. Its dimension is n - rank(A), which complements the subspace spanned by its row space.

 


Notation

Send questions and comments to icsMPGlossary@mail.informs.org.
View the INFORMS Computing Society's Editorial Board
Copyright© 1996 – 2008, Mathematical Programming Glossary, by the INFORMS Computing Society