Directional Derivatives and the Gradient

The question

Partial derivatives measure how a function $f (x, y)$ changes when you move along the coordinate axes --- $\frac{\partial f}{\partial x}$ holds $y$ fixed and varies $x$ , while $\frac{\partial f}{\partial y}$ holds $x$ fixed and varies $y$ . But the coordinate axes are arbitrary. There is nothing physically special about the $x$ - or $y$ -direction; they are an artifact of the coordinate system we happened to choose.

The natural question: what is the rate of change of $f$ in an arbitrary direction? If you stand at a point $(x_{0}, y_{0})$ and walk in a direction that is neither pure $x$ nor pure $y$ , how fast does $f$ change per unit distance traveled?

The answer is the directional derivative, and the tool that makes it computable is the gradient vector $\nabla f$ . Both emerge from a single application of the chain rule.

Setup: walking in direction $u$

Choose a unit vector $u = (a, b)$ with $∥ u ∥ = a^{2} + b^{2} = 1$ . A unit vector is a vector of length 1 --- it specifies a direction without any scale ambiguity.

Starting at the point $P = (x_{0}, y_{0})$ , walk in the direction $u$ . Your position after traveling a distance $s$ along this direction is

r (s) = (x_{0} + a s, y_{0} + b s) .

This is a straight line through $P$ in the direction $u$ , parametrized by arc length --- the parameter $s$ measures actual distance traveled, not some arbitrary quantity. See Arc-Length Parametrization for why this matters: because $u$ is a unit vector, $\frac{d r}{d s} = ∥ u ∥ = 1$ , so the parametrization has unit speed and $s$ is genuinely distance.

From this parametrization, the components of position are functions of $s$ :

x (s) = x_{0} + a s, y (s) = y_{0} + b s .

Their derivatives with respect to $s$ are simply the components of $u$ :

\frac{d x}{d s} = a, \frac{d y}{d s} = b .

The chain rule derivation

Now consider the scalar function $w = f (x, y)$ evaluated along this line. As $s$ varies, both $x$ and $y$ change, so $w$ is a composite function: $w (s) = f (x (s), y (s))$ .

The multivariable chain rule tells us how to differentiate a composite function. If $w = f (x, y)$ where $x$ and $y$ are both differentiable functions of a single parameter $s$ , then

\frac{d w}{d s} = \frac{\partial f}{\partial x} \frac{d x}{d s} + \frac{\partial f}{\partial y} \frac{d y}{d s} .

This is the sum of two contributions: the rate of change of $f$ due to $x$ changing (weighted by how fast $x$ changes with $s$ ) plus the rate of change of $f$ due to $y$ changing (weighted by how fast $y$ changes with $s$ ). It generalizes the single-variable chain rule $\frac{d w}{d s} = \frac{d w}{d x} \frac{d x}{d s}$ to the case where $w$ depends on multiple intermediate variables.

Substituting $\frac{d x}{d s} = a$ and $\frac{d y}{d s} = b$ :

\frac{d w}{d s} = f_{x} \cdot a + f_{y} \cdot b,

where $f_{x} = \frac{\partial f}{\partial x}$ and $f_{y} = \frac{\partial f}{\partial y}$ are the partial derivatives of $f$ , evaluated at $(x_{0}, y_{0})$ .

This quantity $\frac{d w}{d s}_{s = 0}$ is the directional derivative of $f$ at $P$ in the direction $u$ , written $D_{u} f (P)$ .

Concrete example

Take $f (x, y) = x^{2} y + y^{3}$ . Compute the directional derivative at the point $(1, 2)$ in the direction $u = (\frac{3}{5}, \frac{4}{5})$ (which is a unit vector since $9/25 + 16/25 = 1$ ).

Step 1: Compute the partial derivatives.

f_{x} = \frac{\partial}{\partial x} (x^{2} y + y^{3}) = 2 x y, f_{y} = \frac{\partial}{\partial y} (x^{2} y + y^{3}) = x^{2} + 3 y^{2} .

Step 2: Evaluate at $(1, 2)$ .

f_{x} (1, 2) = 2 \cdot 1 \cdot 2 = 4, f_{y} (1, 2) = 1^{2} + 3 \cdot 2^{2} = 1 + 12 = 13.

Step 3: Apply the formula.

D_{u} f (1, 2) = f_{x} \cdot a + f_{y} \cdot b = 4 \cdot \frac{3}{5} + 13 \cdot \frac{4}{5} = \frac{12}{5} + \frac{52}{5} = \frac{64}{5} = 12.8.

The function $f$ increases at a rate of 12.8 units per unit distance when you walk from $(1, 2)$ in the direction $(\frac{3}{5}, \frac{4}{5})$ .

Repackaging as a dot product: the gradient

Look at the formula again:

D_{u} f = f_{x} \cdot a + f_{y} \cdot b .

The right side is a dot product --- the sum of component-wise products of two vectors. The dot product of vectors $v = (v_{1}, v_{2})$ and $w = (w_{1}, w_{2})$ is $v \cdot w = v_{1} w_{1} + v_{2} w_{2}$ . Recognizing this pattern, define the gradient of $f$ :

\nabla f = (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}) = (f_{x}, f_{y}) .

The symbol $\nabla$ is called “nabla” or “del.” The gradient $\nabla f$ is a vector whose components are the partial derivatives of $f$ . It is not new mathematics --- it is a repackaging of information we already had (the partial derivatives) into a single vector object that makes the directional derivative formula clean:

D_{u} f = \nabla f \cdot u .

The directional derivative in any direction $u$ is the dot product of the gradient with $u$ .

The gradient in higher dimensions

Everything generalizes immediately. For $f (x_{1}, x_{2}, \dots, x_{n})$ , the gradient is $\nabla f = (\frac{\partial f}{\partial x _{1}}, \frac{\partial f}{\partial x _{2}}, \dots, \frac{\partial f}{\partial x _{n}})$ , and $D_{u} f = \nabla f \cdot u$ for any unit vector $u \in R^{n}$ .

For the example above: $\nabla f (1, 2) = (4, 13)$ , and $D_{u} f = (4, 13) \cdot (\frac{3}{5}, \frac{4}{5}) = \frac{64}{5}$ . Same answer, but now the computation has a geometric shape.

Three geometric consequences

The power of the dot product formula comes from a fundamental identity. For any two vectors $a$ and $b$ , the dot product satisfies

a \cdot b = ∥ a ∥ ∥ b ∥ cos θ,

where $θ$ is the angle between the two vectors. Since $u$ is a unit vector ( $∥ u ∥ = 1$ ), this simplifies to

D_{u} f = \nabla f \cdot u = ∥\nabla f ∥ cos θ,

where $θ$ is the angle between $\nabla f$ and $u$ . The directional derivative depends only on the magnitude of the gradient and the angle $θ$ . This gives three immediate results:

Maximum rate of increase

When $θ = 0$ (you walk in the same direction as $\nabla f$ ), $cos 0 = 1$ and

D_{u} f = ∥\nabla f ∥.

This is the largest possible directional derivative. The gradient points in the direction of steepest ascent, and its magnitude $∥\nabla f ∥$ is the rate of that steepest ascent.

Maximum rate of decrease

When $θ = π$ (you walk directly opposite to $\nabla f$ ), $cos π = - 1$ and

D_{u} f = - ∥\nabla f ∥.

This is the most negative directional derivative. Walking opposite the gradient gives the steepest descent.

Zero change

When $θ = \frac{π}{2}$ (you walk perpendicular to $\nabla f$ ), $cos \frac{π}{2} = 0$ and

D_{u} f = 0.

Walking perpendicular to the gradient, $f$ does not change at all (to first order). This is not a coincidence --- it is the key to the next section.

The punchline: the gradient is normal to level curves

A level curve (also called a contour) of $f (x, y)$ is a curve in the $x y$ -plane along which $f$ takes a constant value: ${(x, y) : f (x, y) = k}$ for some constant $k$ . For example, the level curves of $f (x, y) = x^{2} + y^{2}$ are circles centered at the origin.

The gradient is perpendicular to level curves

At any point on a level curve $f (x, y) = k$ , the gradient $\nabla f$ is perpendicular (normal) to the level curve.

Proof. Let $u$ be a unit tangent vector to the level curve at some point $P$ . “Tangent to the level curve” means $u$ points along the curve --- in a direction where $f$ stays constant. Since $f$ does not change along the level curve, the rate of change of $f$ in the direction $u$ is zero:

$D_{u} f = 0.$

But $D_{u} f = \nabla f \cdot u$ . So

$\nabla f \cdot u = 0.$

A dot product of zero means the two vectors are orthogonal (perpendicular), provided neither is the zero vector. Therefore $\nabla f ⊥ u$ . Since this holds for every tangent direction $u$ to the level curve, $\nabla f$ is normal to the level curve at $P$ . $■$

This is one of the most important results in multivariable calculus. It connects two seemingly different ideas --- the algebraic object $\nabla f$ (a vector of partial derivatives) and the geometric object “level curve” (a contour of constant $f$ ) --- through the directional derivative.

The gradient does double duty: it tells you the direction of steepest increase and it tells you which way is “outward” from a level curve. These are the same thing --- the steepest way to increase $f$ is to walk directly away from the current contour toward higher-valued contours.

Worked example: gradient perpendicular to a circle

Take $f (x, y) = x^{2} + y^{2}$ .

Step 1: Compute the gradient.

\nabla f = (2 x, 2 y) .

Step 2: Evaluate at the point $(1, 1)$ .

\nabla f (1, 1) = (2, 2) .

Step 3: Identify the level curve through $(1, 1)$ .

f (1, 1) = 1^{2} + 1^{2} = 2,

so the level curve is $x^{2} + y^{2} = 2$ , a circle of radius $2$ centered at the origin.

Step 4: Find a tangent vector to the circle at $(1, 1)$ .

The circle $x^{2} + y^{2} = 2$ can be parametrized as $r (t) = (2 cos t, 2 sin t)$ . The point $(1, 1) = (2 cos \frac{π}{4}, 2 sin \frac{π}{4})$ corresponds to $t = \frac{π}{4}$ . The tangent vector is

r^{'} (t) = (- 2 sin t, 2 cos t) .

At $t = \frac{π}{4}$ :

r^{'} (\frac{π}{4}) = (- 2 \cdot \frac{2}{2}, 2 \cdot \frac{2}{2}) = (- 1, 1) .

Step 5: Verify perpendicularity.

\nabla f (1, 1) \cdot r^{'} (\frac{π}{4}) = (2, 2) \cdot (- 1, 1) = 2 \cdot (- 1) + 2 \cdot 1 = - 2 + 2 = 0.

The dot product is zero, confirming that the gradient $(2, 2)$ is perpendicular to the tangent direction $(- 1, 1)$ at the point $(1, 1)$ .

Geometric picture

At every point on the circle $x^{2} + y^{2} = 2$ , the gradient $\nabla f = (2 x, 2 y)$ points radially outward from the origin, while the tangent to the circle points along the circumference. Radii and tangent lines of a circle are always perpendicular --- the gradient-level-curve relationship is the general version of this familiar geometric fact.

The diagram below shows concentric level curves of $f (x, y) = x^{2} + y^{2}$ with gradient arrows (red) pointing radially outward. At the upper-left point, a green tangent segment and a right-angle marker confirm the perpendicularity.

Edmondo's Vault

Explorer

Directional Derivatives and the Gradient

Directional Derivatives and the Gradient

The question

Setup: walking in direction $u$

The chain rule derivation

Concrete example

Repackaging as a dot product: the gradient

Three geometric consequences

Maximum rate of increase

Maximum rate of decrease

Zero change

The punchline: the gradient is normal to level curves

Worked example: gradient perpendicular to a circle

See also

Graph View

Table of Contents

Backlinks

Edmondo's Vault

Explorer

Directional Derivatives and the Gradient

Directional Derivatives and the Gradient

The question

Setup: walking in direction u

The chain rule derivation

Concrete example

Repackaging as a dot product: the gradient

Three geometric consequences

Maximum rate of increase

Maximum rate of decrease

Zero change

The punchline: the gradient is normal to level curves

Worked example: gradient perpendicular to a circle

See also

Graph View

Table of Contents

Backlinks

Setup: walking in direction $u$