Multivariable Differential Calculus | An Introduction to Real Analysis

In this chapter, we consider the differential calculus of mappings from one Euclidean space to another, that is, mappings

F : R^{n} \to R^{m}

. In first-year calculus, you considered the case

n = 2

n = 3

and

m = 1

. Examples of functions that you might have encountered were of the type

F (x_{1}, x_{2}) = x_{1}^{2} - x_{2}^{2}

F (x_{1}, x_{2}, x_{3}) = x_{1}^{2} + x_{2}^{2} + x_{3}^{2}

, or maybe even

F (x_{1}, x_{2}) = \sin (x_{1}) \sin (x_{2})

, etc. If now

F : R^{n} \to R^{m}

with

m \geq 2

then

F

has

m

component functions since

F (x) \in R^{m}

. We can therefore write

F (x) = (f_{1} (x), f_{2} (x), \dots, f_{m} (x))

and

f_{j} : R^{n} \to R

is called the

j

th component of

F

. In this chapter, unless stated otherwise, we equip

R^{n}

with the Euclidean 2-norm

{‖ x ‖}_{2} = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots + x_{n}^{2}}

. For this reason, we will omit the subscript in

{‖ x ‖}_{2}

and simply write

‖ x ‖

Differentiation

Let

U \subset R^{n}

and let

F : U \to R^{n}

be a function. How should we define differentiability of

F

at some point

a \in U

? Recall that for a function

f : I \to R

, where

I \subset R

, we say that

f

is differentiable at

a \in I

lim_{x \to a} \frac{f (x) - f (a)}{x - a}

exists. In this case, we denote

f^{'} (a) = lim_{x \to a} \frac{f (x) - f (a)}{x - a}

and we call

f^{'} (a)

the derivative of

f

a

. As it is written, the above definition does not make sense for

F

since division of vectors is not well-defined (or at least we have not defined it). An equivalent definition of differentiability of

f

a

is that there exists a number

m \in R

such that

lim_{x \to a} \frac{f (x) - f (a) - m (x - a)}{x - a} = 0

which is equivalent to asking that

lim_{x \to a} \frac{| f (x) - f (a) - m (x - a) |}{| x - a |} = 0.

The number

m

is then denoted by

m = f^{'} (a)

as before. Another way to think about the derivative

m

is that the affine function

g (x) = f (a) + m x

is a good approximation to

f (x)

for points

x

near

a

. The linear part of the affine function

g

ℓ (x) = m x

. Thought of in this way, the derivative of

f

a

is a linear function.

Let

U

be a subset of

R^{n}

. A mapping

F : U \to R^{m}

is said to be differentiable at

a \in U

if there exists a linear mapping

L : R^{n} \to R^{m}

such that

lim_{x \to a} \frac{‖ F (x) - F (a) - L (x - a) ‖}{‖ x - a ‖} = 0.

In the definition of differentiability, the expression

L (x - a)

denotes the linear mapping

L

applied to the vector

(x - a) \in R^{n}

. An equivalent definition of differentiability is that

lim_{h \to 0} \frac{‖ F (x + h) - F (a) - L (h) ‖}{‖ h ‖} = 0

where again

L (h)

denotes

h \in R^{n}

evaluated at

L

. It is not hard to show that the linear mapping

L

in the above definition is unique when

U \subset R^{m}

is an open set. For this reason, we will deal almost exclusively with the case that

U

is open without further mention. We therefore call

L

the derivative of $F$ at $a$ and denote it instead by

L = D F (a)

. Hence, by definition, the derivative of

F

a

is the unique linear mapping

D F (a) : R^{n} \to R^{m}

satisfying

lim_{x \to a} \frac{‖ F (x) - F (a) - D F (a) (x - a) ‖}{‖ x - a ‖} = 0.

Applying the definition of the limit, given arbitrary

ε > 0

there exists

δ > 0

such that if

‖ x - a ‖ < δ

then

\frac{‖ F (x) - F (a) - D F (a) (x - a) ‖}{‖ x - a ‖} < ε

or equivalently

‖ F (x) - F (a) - D F (a) (x - a) ‖ < ε ‖ x - a ‖ .

F : U \to R^{m}

is differentiable at each

x \in U

then

x \mapsto D F (x)

is a mapping from

U

to the space of linear maps from

R^{n}

R^{m}

. In other words, if we denote by

L (R^{n}; R^{m})

the space of linear maps from

R^{n}

R^{m}

then we have a well-defined mapping

D F : U \to L (R^{n}; R^{m})

called the derivative of

F

U

which assigns the derivative of

F

at each

x \in U

. We now relate the derivative of

F

with the derivatives of its component functions. To that end, we need to recall some basic facts from linear algebra and the definition of the partial derivative. For the latter, recall that a function

, has partial derivative at

with respect to

, if the following limit exists

or equivalently, if there exists a number

such that

where

denotes the

th standard basis vector in

. We then denote

. Now, given any linear map

, the action of

on vectors in

can be represented as matrix-vector multiplication once we choose a basis for

and

. Specifically, if we choose the most convenient bases in

and

, namely the standard bases, then

where

and the the

entry of the matrix

is the

th component of the vector

. We can now prove the following.

Let

be open and suppose that

is differentiable at

, and write

. Then the partial derivatives

exist, and the matrix representation of

in the standard bases in

and

where all partial derivatives are evaluated at

. The matrix above is called the Jacobian matrix of at .

Let

denote the

entry of the matrix representation of

in the standard bases in

and

, that is,

is the

th component of

. By definition of differentiability, it holds that

Let

where

is the

th standard basis vector. Since

is open,

provided

is sufficiently small. Then since

iff

we have

It follows that each component of the vector

tends to

. Hence, for each

we have

Hence,

exists and

as claimed.

It is customary to write

since for any

the vector

is the Jacobian matrix of

multiplied by

(all partials are evaluated at

). When not explicitly stated, the matrix representation of

will always mean the Jacobian matrix representation. We now prove that differentiability implies continuity. To that end, we first recall that if

and

then

The proof of this fact is identical to the one in Example 9.4.16. In particular, if

then

Let

be an open set. If

is differentiable at

then

is continuous at

Let

. Then there exists

such that if

then

Then if

then

and thus

provided

Hence,

is continuous at

Notice that Theorem 10.1.2 says that if

exists then all the relevant partials exist. However, it does not generally hold that if all the relevant partials exist then

exists. The reason is that partial derivatives are derivatives along the coordinate axes whereas, as seen from the definition, the limit used to define

is along any direction that

Consider the function

defined as

We determine whether

and

exist. To that end, we compute

Therefore,

and

exist and are both equal to zero. It is straightforward to show that

is not continuous at

and therefore not differentiable at

The previous examples shows that existence of partial derivatives is a fairly weak assumption with regards to differentiability, in fact, even with regards to continuity. The following theorem gives a sufficient condition for

to exist in terms of the partial derivatives.

Let

be an open set and consider

with

. If each partial derivative function

exists and is continuous on

then

is differentiable on

We will omit the proof of Theorem 10.1.5.

Let

be defined by

Explain why

exists for each

and find

It is clear that the component functions of

that are given by

, and

have partial derivatives that are continuous on all of

. Hence,

is differentiable on

. Then

Prove that the given function is differentiable on

We compute

and thus

. A similar computations shows that

. On the other hand, if

then

To prove that

exists for any

, it is enough to show that

and

are continuous on

(Theorem 10.1.5). It is clear that

and

are continuous on the open set

and thus

exists on

. Now consider the continuity of

. Using polar coordinates

and

, we can write

Now

if and only if

and thus

In other words,

and thus

is continuous at

. A similar computation shows that

is continuous at

. Hence, by Theorem 10.1.5,

exists on

is differentiable on

and

, then

is called the gradient of

and we write

instead of

. Hence, in this case,

On the other hand, if

and

then

is a curve in

. In this case, it is customary to use lower-case letters such as

, or

instead of

, and use

for the domain instead of

. In any case, since

is a function of one variable we use the notation

and the derivative of

is denoted by

where all derivatives are derivatives of single-variable-single-valued functions.

Exercises

Let

be differentiable functions at

. Prove by definition that

is differentiable at

and that

Recall that a mapping

is said to be linear if

and

, for all

and

. Prove that if

is linear then

for all

Let

and suppose that there exists

such that

for all

. Prove that

is differentiable at

and that

Determine if the given function is differentiable at

Compute

Differentiation Rules and the MVT

Let

and

be open sets. Suppose that

is differentiable at

, and

is differentiable at

. Then

is differentiable at

and

Verify the chain rule for the composite function

where

and

are

An important special case of the chain rule is the composition of a curve

with a function

. The composite function

is a single-variable and single-valued function. In this case, if

is defined for all

and

exists at each

then

In the case that

and

is a unit vector, that is,

, then

is called the directional derivative of at in the direction .

Let

and

be differentiable and suppose that

. Prove that if

then

where

Below is a version of the product rule for multi-variable functions.

Let

be open and suppose that

and

are differentiable at

. Then the function

is differentiable at

and

Verify the product rule for

and

are

Let

be differentiable functions. Find an expression of

in terms of

, and

Let

be a differentiable function. Suppose that

is differentiable. Prove that

for all

if and only if

for all

Recall the mean value theorem (MVT) on

. If

is continuous on

and differentiable on

then there exists

such that

. The MVT does not generally hold for a function

without some restrictions on

and, more importantly, on

. For instance, consider

defined by

. Then

while

and there is no

such that

. With regards to the domain

, we will be able to generalize the MVT for points

provided all points on the line segment joining

and

are contained in

. Specifically, the line segment joining

is the set of points

Hence, the image of the curve

given by

is the line segment joining

and

. Even if

is open, the line segment joining

may not be contained in

(see Figure 10.1).

figures/line-segment.svg — Line segment joining and not in

Let

be open and assume that

is differentiable on

. Let

and suppose that the line segment joining

is contained entirely in

. Then there exists

on the line segment joining

and

such that

Let

for

. By assumption,

for all

. Consider the function

. Then

is continuous on

and by the chain rule is differentiable on

. Hence, applying the MVT on

there exists

such that

. Now

and

, and by the chain rule,

Hence,

and the proof is complete.

Let

be open and assume that

is differentiable on

. Let

and suppose that the line segment joining

is contained entirely in

. Then there exists

on the line segment joining

and

such that

for

Apply the MVT to each component function

A set

is said to be convex if for any

the line segment joining

and

is contained in

. Let

be differentiable. Prove that if

is an open convex set and

then

is constant on

Exercises

Let

be an open set satisfying the following property: for any

there is a continuous curve

such that

is differentiable on

and

Give an example of a non-convex set satisfying the above property.
Prove that if satisfies the above property and is differentiable on with then is constant on .

The Space of Linear Maps

Let

be an open subset of

. Recall that if

is differentiable at each

then

denotes the derivative of

. The space of linear maps

is a vector space which after

Solutions to Differential Equations

A differential equation on

is an equation of the form

where

is a given function and

is the unknown in

. A solution to

is a curve

such that

where

is an interval, possibly infinite. If

is defined

Let

be an open set and let

be a differentiable function with a continuous derivative

High-Order Derivatives

In this section, we consider high-order derivatives of a differentiable mapping

. To do this, we will need to make an excursion into the world of multilinear algebra. Even though we will discuss high-order derivatives for functions on Euclidean spaces, it will be convenient to first work with general vector spaces.

Let

and

be vector spaces. A mapping

is said to be a -multilinear map if

is linear in each variable separately. Specifically, for any

, and any

for

, the mapping

defined by

is a linear mapping.

-multilinear mapping is just a linear mapping. A

-multilinear mapping is called a bilinear mapping. Hence,

is bilinear if

and

for all

, and

. Roughly speaking, a multilinear mapping is essentially a special type of polynomial multivariable function. We will make this precise after presenting a few examples.

Consider

defined as

. As can be easily verified,

is bilinear. On the other hand, if

then

is not bilinear since for example

in general, or

in general. What about

Let

be a set of vectors in

and suppose that

and

. If

is bilinear then expand

so that it depends only on

and

for

Let

be a

matrix and define

. Show that

is bilinear. For instance, if say

then

Notice that

is a polynomial in the components of

and

The function that returns the determinant of a matrix is multilinear in the columns of the matrix. Specifically, if say

then

and if

then

These facts are proved by expanding the determinant along the first column. The same is true if we perform the same computation with a different column of

. In the case of a

matrix

we have

and if

is a

matrix with columns

, and

then

We now make precise the statement that a multilinear mapping is a (special type of) multivariable polynomial function. For simplicity, and since this will be the case when we consider high-order derivatives, we consider

-multilinear mappings

. For a positive integer

let

where on the right-hand-side

appears

-times. Let

denote the space of

-multilinear maps from

. It is easy to see that

is a vector space under the natural notion of addition and scalar

-multiplication. In what follows we consider the case

, the general case is similar but requries more notation. Hence, suppose that

is a multilinear mapping and let

, and

. Then

where

is the

th standard basis vector of

, and similarly for

and

. Therefore, by multilinearity of

we have

Thus, to compute

for any

, we need only know the values

for all triples

with

. If we set

where the superscripts are not exponents but indices, then from our computation above

Notice that the component functions of

are multilinear, specifically, the mapping

is multilinear for each

. The

numbers

for

and

completely determine the multilinear mapping

, and we call these the coefficients of the multilinear mapping

in the standard bases.

The general case

is just more notation. If

-multilinear then there exists

unique coefficients

, where

and

, such that for any vectors

it holds that

where

are the standard basis vectors in

A multilinear mapping

is said to be symmetric if the value of

is unchanged after an arbitrary permutation of the inputs to

. In other words,

is symmetric if for any

it holds that

for any permutation

. For instance, if

is symmetric then for any

it holds that

Consider

defined by

Then

and therefore

is symmetric. Notice that

and the matrix

is symmetric.

Having introduced the very basics of multilinear mappings, we can proceed with discussing high-order derivatives of vector-valued multivariable functions. Suppose then that

is differentiable on the open set

and as usual let

denote the derivative. Now

is a finite dimensional vector space and can be equipped with a norm (all norms on a given finite dimensional vector space are equivalent). Thus, we can speak of differentiability of

, namely,

is differentiable at

if there exists a linear mapping

such that

If such an

exists then we denote it by

. To simplify the notation, we write instead

. Hence,

is differentiable at

if there exists a linear mapping

such that

To say that

is a linear mapping from

is to say that

Let us focus our attention on the space

. If

then

for each

, and moreover the assignment

is linear, i.e.,

. Now, since

, we have that

In other words, the mapping

is bilinear! Hence,

defines (uniquely) a bilinear map

and the assignment

is linear. Conversely, to any bilinear map

we associate an element

defined as

and the assignment

is linear. We have therefore proved the following.

Let

and

be vector spaces. The vector space

is isomorphic to the vector space

of multilinear maps from

The punchline is that

can be viewed in a natural way as a bilinear mapping

and thus from now on we write

instead of the more cumbersome

. We now determine a coordinate expression for

. First of all, if

then

where

is the standard basis of

. By linearity of the derivative and the product rule of differentiation, we have that

and also

. Therefore,

This shows that we need only consider

for

-valued functions

. Now,

and thus the Jacobian of

is (Theorem 10.1.2)

Therefore,

Therefore, for any

and

, by multilinearity we have

Now, if all second order partials of

are defined and continuous on

we can say more. Let us first introduce some terminology. We say that

is of class if all partial derivatives up to and including order

are continuous functions on

Let

be an open set and suppose that

is of class

. Then

for all

. Consequently,

is a symmetric bilinear map on

If we now go back to a multi-valued function

with components

, then if

exists at

then

Higher-order derivatives of

can be treated similarly. If

is differentiable at

then we denote the derivative at

. Then

is a linear map, that is,

The vector space

is isomorphic to the space of

-multilinear maps

. The value of

is denoted by

. Moreover,

is a symmetric

-multilinear map at each

is of class

. If

is of class

then for vectors

we have

where the summation is over all

-tuples

where

. Hence, there are

terms in the above summation. In the case that

, the above expression takes the form

Compute

, and

. Also compute

We compute that

and

and then

Then,

then

Taylor's Theorem

Taylor's theorem for a function

is as follows.

Let

be an open set and suppose that

if of class

. Let

and suppose that the line segment between

and

lies entirely in

. Then there exists

on the line segment such that

where

Furthermore,

in Taylor's theorem then

and

We call

the th order Taylor polynomial of centered at and

the

th order remainder term. Hence, Taylor's theorem says that

Since

, for

close to

we get an approximation

Moreover, since

is continuous, there is a constant

such that if

is sufficiently close to

then the remainder term satisfies the bound

From this it follows that

Compute the third-order Taylor polynomial of

centered at

Most of the work has been done in Example 10.5.10. Evaluating all derivatives at

we find that

Therefore,

Exercises

Find the 2nd order Taylor polynomial of the function

centered at

A function

is called a homogeneous function of degree

if for all

and

it holds that

. Prove that if

is differentiable at

then the mapping

is a homogeneous function of degree

The Inverse Function Theorem

A square linear system

or in vector form

where the unknown is

, has a unique solution if and only if

exists if and only if

. In this case, the solution is

. Another way to say this is that the mapping

has a global inverse given by

. Hence, invertibility of

completely determines whether

is invertible. Consider now a system of equations

where

is nonlinear. When is it possible to solve for

in terms of

, that is, when does

exists? In general, this is a difficult problem and we cannot expect global invertibility even when assuming the most desirable conditions on

. Even in the 1D case, we cannot expect global invertibility. For instance,

is not globally invertible but is so on any interval where

. For instance, on the interval

, we have that

and

. In any neighborhood where

, for instance, at

is not invertible. However, having a non-zero derivative is not necessary for invertibility. For instance, the function

has

but

has an inverse locally around

; in fact it has a global inverse

. Let's go back to the 1D case and see if we can say something about the invertibility of

locally about a point

such that

. Assume that

is continuous on

(or on an open set containing

). Then there is an interval

such that

for all

. Now if

and

, then by the Mean Value Theorem, there exists

in between

and

such that

Since

and

then

. Hence, if

then

and this proves that

is injective on

. Therefore, the function

has an inverse

where

. Hence, if

has a local inverse at

. In fact, we can say even more, namely, one can show that

is also differentiable. Then, since

for

, by the chain rule we have

and therefore since

for all

we have

The following theorem is a generalization of this idea.

Let

be an open set and let

be of class

. Suppose that

for

. Then there exists an open set

containing

such that

is open and

is invertible. Moreover, the inverse function

is also

and for

and

we have

Prove that

is locally invertible at all points

Clearly,

exists for all

since all partials of the components of

are continuous on

. A direct computation gives

and thus

. Clearly,

if and only if

. Therefore, by the Inverse Function theorem, for each non-zero

there exists an open set

containing

such that

is invertible. In this very special case, we can find the local inverse of

about some

. Let

, that is,

then

and therefore

By the quadratic formula,

Since

we must take

and therefore

Hence, provided

and

then

Exercises

Let

be defined by

for

Prove that the range of is . Hint: Think polar coordinates.
Prove that is not injective.
Prove that is locally invertible at every .

Can the system of equations

be solved for

in terms of

near