Lagrange Multipliers
The problem: optimization with constraints
In unconstrained optimization (see Critical Points and the Hessian), we find the extrema of a function by setting (see Directional Derivatives and the Gradient for the gradient) and solving. This works because at an interior extremum, the directional derivative is zero in every direction, which forces the gradient to vanish.
But many real problems come with constraints --- restrictions that confine the domain to a specific curve or surface. For example:
- Minimize cost subject to a production requirement .
- Find the closest point on a curve to the origin: minimize subject to the curve equation .
- Maximize area subject to a fixed perimeter.
In these problems, we cannot simply set . The extremum may occur at a point where --- the gradient does not vanish, but the extremum exists because we are restricted to the constraint curve . We need a different approach.
The method of Lagrange multipliers, developed by Joseph-Louis Lagrange in 1788 in his Mecanique Analytique, provides that approach. There are two ways to derive it: a geometric argument based on tangent level curves, and an analytic argument based on the chain rule. Both lead to the same condition.
Path 1 --- Geometric derivation (tangent level curves)
Tangency from first principles
Before we can state the geometric argument, we need a precise definition of what it means for two curves to be tangent.
A smooth curve in is a set of points that can be described by a parametrization --- a vector-valued function where and are differentiable functions of a parameter (see Arc-Length Parametrization for parametrizations and the role of arc length). The tangent vector at a point is
This vector points in the instantaneous direction of motion along the curve. The tangent line at is the unique line through in the direction of :
Now consider two smooth curves and that pass through the same point . We say the curves are tangent at when they share the same tangent line at --- equivalently, their tangent vectors at are parallel (one is a scalar multiple of the other).
Tangent vs. intersecting
Two curves can intersect at a point without being tangent --- they cross at an angle, like an X. Tangency means they touch and locally point in the same direction, like two roads merging smoothly.

Concrete example. The diagram above shows the circle meeting two different curves at the same point . Left: the parabola intersects the circle --- the tangent vectors point in different directions (). Right: the line is tangent to the circle --- the tangent vectors are parallel.
Let’s verify this with computation. Parametrize the circle as and the parabola as .
Tangent vector of the circle at . The point corresponds to :
Tangent vector of the parabola at . The point corresponds to :
Are and parallel? Two vectors are parallel if one is a scalar multiple of the other: requires from the first component and from the second. These are inconsistent, so the vectors are not parallel. The circle and parabola intersect at but are not tangent there --- they cross at an angle.
For a tangent point, we would need a curve whose tangent vector at the intersection is a scalar multiple of . The line passes through with direction vector , so this line is tangent to the circle at --- as shown in the right panel.
The level curve picture
Now apply the tangency concept to constrained optimization. Consider the function and the constraint .
A level curve (also called a contour) of is the set of all points where takes a specific constant value: (see Directional Derivatives and the Gradient for the full treatment of level curves and their relationship to the gradient). As increases, the level curves of sweep across the plane. Think of a topographic map: each contour line is a level curve of the elevation function, and higher contour values correspond to higher ground.
Walking along the constraint

Imagine drawing the constraint curve on top of these level curves. The diagram above shows this for (whose level curves are circles centered at the origin) with the constraint (the dark line).
Now walk along the constraint and track how changes:
| Position on constraint | What happens | |
|---|---|---|
| Starting point --- far from the origin | ||
| Crossed from the circle inward to --- decreased, keep going | ||
| Reached the circle --- the constraint just touches this circle | ||
| Crossed back outward to --- increased, we passed the minimum | ||
| Back to --- symmetric with the start |
At , the level curve crosses the constraint line --- the curves intersect transversally. You can keep walking and reach a smaller value of . This is not the minimum.
At , the level curve touches the constraint without crossing it. Walking in either direction takes you to larger circles (). This is the constrained minimum --- and the level curve is tangent to the constraint here.
Core geometric insight
Constrained extrema of on the curve occur where a level curve of is tangent to the constraint curve .

The diagram above zooms in on the tangent point. The parallel gradient arrows (blue) and (red) at confirm the Lagrange condition --- the gradients point in the same direction because the curves are tangent.
Tangency implies parallel normals
At a tangent point, the level curve of and the constraint curve share the same tangent line. A normal vector to a curve at a point is any vector perpendicular to the tangent line at that point.
Since the two curves share the same tangent line, any vector perpendicular to that tangent line is simultaneously normal to both curves. In particular, the normal vectors of the two curves must point along the same line --- they are parallel (one is a scalar multiple of the other).
The gradient is the normal
From Directional Derivatives and the Gradient, the gradient at a point is perpendicular to the level curve of through that point. Similarly, is perpendicular to the level curve of --- and the constraint is itself a level curve of .
So:
- The normal to the level curve of at the tangent point is .
- The normal to the constraint curve at the tangent point is .
The condition “normals are parallel” becomes:
for some scalar , called the Lagrange multiplier.
Path 2 --- Analytic derivation ( on the constraint)
Restriction to the constraint

The key idea of Path 2 is to forget the 2D plane entirely and view as a function of a single variable: position along the constraint. The diagram above shows both views side by side. Left: the familiar 2D picture with level curves and constraint. Right: the same information collapsed to one dimension --- plotted against position along the constraint line (where , ).
The 1D picture makes the extremum obvious: is a parabola with a minimum at . At the minimum, the curve is flat --- . This is just single-variable calculus.
To make this precise: parametrize the constraint by arc length . Then becomes a function of the single variable : the value of at whatever point on the constraint corresponds to position .
If has a constrained maximum or minimum at a point on the constraint, then -as-a-function-of- has a local maximum or minimum at the corresponding . By single-variable calculus, a necessary condition for a local extremum is
Clarification
does not mean is constant on the constraint. It means that at this specific point, (viewed as a function of position along the constraint) has a local extremum. Like a hilltop on a mountain trail: the trail keeps going, but the elevation momentarily stops changing. Before and after this point, may well be increasing or decreasing along the constraint.
Chain rule
Let be the unit tangent vector to the constraint curve at . The condition ” has a constrained extremum at ” can be rephrased as: the derivative of along any tangent vector of the constraint must be zero at . This is exactly the single-variable extremum condition applied to the restriction of to the constraint.
By the chain rule:
At the constrained extremum, , so
This means is perpendicular to . But is also perpendicular to , because is perpendicular to the level curves of (and the constraint is a level curve of ).
In , there is only one independent direction perpendicular to a given line direction. Since both and are perpendicular to the same tangent direction , they must be parallel:
Both paths agree
Path 1 (tangent level curves) and Path 2 () produce the same necessary condition: . The geometric argument gives visual intuition --- constrained extrema occur where level curves kiss the constraint. The analytic argument gives a rigorous chain-rule proof. Together, they reinforce the result from complementary perspectives.
Completing the solution
The system of equations
For a function subject to a single constraint , the condition expands to two scalar equations:
Adding the constraint itself gives a third equation:
This is a system of three equations in three unknowns: , , and . Solving this system yields the candidate points for constrained extrema.
Higher dimensions
For subject to , the condition gives scalar equations. Together with the constraint, that is equations in unknowns (). For constraints , the condition becomes , giving equations in unknowns.
What means: the shadow price
The multiplier is not just an algebraic device for solving the system --- it has a direct interpretation. Let denote the optimal value of as a function of the constraint level . Then
That is, measures how much the optimum improves (or worsens) per unit relaxation of the constraint. In economics, this is called the shadow price --- the marginal value of relaxing the constraint by one unit. A large means the constraint is binding tightly: loosening it even slightly would significantly change the optimal value. A small means the constraint is nearly slack.
Worked example
Problem. Minimize subject to .
Geometrically: find the point on the line that is closest to the origin (since is the squared distance to the origin).
Step 1: Compute the gradients.
Step 2: Set up .
From the first equation, . From the second, . Therefore , so .
Step 3: Apply the constraint.
Step 4: Compute and .
The closest point on the line to the origin is , at squared distance .
Step 5: Verify the shadow price interpretation.
Change the constraint to and re-solve. The same argument gives , so
The change in optimal value is
The small discrepancy ( vs. ) is the second-order error --- is exact only in the limit . But even for a finite step of , the approximation is very close, confirming the interpretation.
See also
- Directional Derivatives and the Gradient --- the gradient-perpendicular-to-level-curves result that underpins both derivation paths
- Critical Points and the Hessian --- unconstrained optimization and the second-derivative test; Lagrange multipliers handle the constrained case
- Arc-Length Parametrization --- parametrization by arc length used in Path 2 to restrict to the constraint curve