Critical Points and the Hessian

Critical points from the gradient

A critical point of a differentiable function is a point where the gradient vanishes:

See Directional Derivatives and the Gradient for the full derivation of and the formula . That formula has an immediate consequence here: if , then for every unit vector . The directional derivative is zero in every direction. Geometrically, the surface is momentarily flat at --- it has a horizontal tangent plane.

But “flat” is ambiguous. A hilltop is flat at the summit (local maximum). A valley floor is flat at the bottom (local minimum). A mountain pass is flat at the saddle --- it curves up in one direction and down in another (saddle point). The gradient alone cannot distinguish these cases because it only captures first-order behaviour. To classify a critical point, we need second-order information: the second partial derivatives of .

Second-order Taylor expansion at a critical point

The tool for extracting second-order behaviour is the Taylor expansion --- the approximation of a function by a polynomial that matches the function’s value and derivatives at a point. For a function of two variables, the second-order Taylor expansion of around the point is

where all partial derivatives are evaluated at , is the displacement in , and is the displacement in . The notation , , and denotes the second partial derivatives --- derivatives of derivatives.

At a critical point, and , so the linear terms vanish. Subtracting the constant , the change in function value is controlled entirely by the quadratic form --- a homogeneous polynomial of degree 2 in and :

where we introduce the shorthand

A quadratic form is an expression of the type --- a polynomial where every term has total degree exactly 2. Quadratic forms are the two-variable analogue of in single-variable calculus: they capture the curvature of the function.

The key insight

Near a critical point, the function’s behaviour is determined by the sign of the quadratic form . If this form is always positive (for every displacement ), the critical point is a local minimum. If always negative, a local maximum. If it changes sign depending on direction, a saddle point.

Quadratic form as matrix product: the Hessian

The quadratic form can be written as a matrix product. Define the column vector and the matrix

Then

and so .

The matrix is the Hessian matrix of at the critical point --- the matrix of all second partial derivatives. It is named after the German mathematician Ludwig Otto Hesse (1811—1874), who introduced it in the context of algebraic geometry.

Notice that appears in both off-diagonal positions. This is because of Clairaut’s theorem (also called Schwarz’s theorem): if the second partial derivatives and are both continuous, then --- the order of differentiation does not matter. This symmetry is what makes a symmetric matrix (a matrix equal to its own transpose: ).

Notation bridge

MIT 18.02 uses , , and checks . Politecnico di Torino teaches this as the Hessian matrix , checking and eigenvalues. Same object, same test, different packaging.

The second-derivative test: classification table

The sign behaviour of the quadratic form determines the nature of the critical point. The classification depends on the determinant of the Hessian, , and the sign of :

Classification
Local minimum
Local maximum
anySaddle point
anyDegenerate (test inconclusive)

Why these conditions work

  • When and : both eigenvalues are positive (see next section), so the quadratic form is always positive --- the surface curves upward in every direction. Local minimum.
  • When and : both eigenvalues are negative, so the quadratic form is always negative --- the surface curves downward in every direction. Local maximum.
  • When : the eigenvalues have opposite signs, so the quadratic form is positive in some directions and negative in others --- the surface curves up one way and down another. Saddle point.
  • When : at least one eigenvalue is zero, and the second-order information is insufficient. The point could be a minimum, maximum, saddle, or something more exotic. Higher-order derivatives are needed. This case is called degenerate.

A quadratic form that is always positive (for all non-zero ) is called positive definite. Always negative: negative definite. Takes both signs: indefinite. Has a zero but is otherwise non-negative (or non-positive): semi-definite.

The diagram below shows the contour patterns for each case: elliptical contours closing around a minimum (left), hyperbolic contours at a saddle with eigenvector arrows showing the up/down directions (center), and elliptical contours around a maximum (right).

Eigenvalue connection

The classification table above is really a statement about the eigenvalues of . An eigenvalue of a matrix is a scalar such that for some non-zero vector (called an eigenvector) --- the matrix acts on by simply scaling it. The eigenvalues of a matrix are the roots of its characteristic polynomial .

Because is symmetric, a powerful result from linear algebra guarantees that its eigenvalues are well-behaved:

Spectral Theorem (for real symmetric matrices). Every real symmetric matrix has all real eigenvalues and can be diagonalized by an orthogonal matrix (a matrix whose columns are mutually perpendicular unit vectors). In other words, symmetric matrices have no complex eigenvalues, and their eigenvectors can be chosen to be orthogonal.

For our Hessian with eigenvalues and , two standard facts from linear algebra connect eigenvalues to the determinant and trace:

The trace is the sum of the diagonal entries. Now the classification table translates directly into eigenvalue language:

Eigenvalue conditionQuadratic formCritical point
Positive definiteLocal minimum
Negative definiteLocal maximum
and have opposite signsIndefiniteSaddle point
or Semi-definiteDegenerate

When both eigenvalues are positive, their product and (because is the entry of a positive-definite matrix). When both are negative, the product is still positive but . When they have opposite signs, the product is negative. This is exactly the test in the previous section.

The eigenvalue perspective also reveals the principal directions of curvature: the eigenvectors of point in the directions along which the surface curves the most and the least. The corresponding eigenvalues are the curvatures in those directions.

Worked examples

Example 1: (local minimum)

Gradient.

Setting gives the critical point .

Hessian.

Determinant and classification.

and : local minimum.

Eigenvalues. , . Both positive, confirming positive definiteness. The surface is a paraboloid opening upward.


Example 2: (local maximum)

Gradient.

Critical point: .

Hessian.

Determinant and classification.

and : local maximum.

Eigenvalues. , . Both negative, confirming negative definiteness. The surface is an inverted paraboloid.


Example 3: (saddle point)

Gradient.

Critical point: .

Hessian.

Determinant and classification.

: saddle point, regardless of .

Eigenvalues. , . Opposite signs confirm indefiniteness. Along the -axis the surface curves up (); along the -axis it curves down (). The surface is a hyperbolic paraboloid --- the classic saddle shape.


Example 4: (degenerate)

Gradient.

Setting : gives , and is free --- but everywhere, so the entire -axis consists of critical points. Take .

Hessian.

Determinant and classification.

: degenerate --- the second-derivative test is inconclusive.

Eigenvalues. , . Both zero: the Hessian carries no curvature information at all.

What actually happens. Along the -axis, , which is increasing (not a minimum or maximum). The origin is an inflection point of the single-variable slice . To resolve degenerate cases, one must examine third- or higher-order derivatives, or analyse the function directly. The second-derivative test simply cannot help here.

See also