MATH 329 - Differentiation

I. Essential Notions and Concepts

Definition (Differentiability of functions on $\mathbb R^n$). Suppose $E\subset \mathbb R^n$ is open, and $f: E\to \mathbb R^m$, and $\mathbf x\in E$. The map $f$ is said to be differentiable at $\mathbf x$ if there exists a linear transformation $A\in L(\mathbb R^n, \mathbb R^m)$ such that $$\lim_{\mathbf h\to 0} \frac{|f(\mathbf x+\mathbf h)-f(\mathbf x)-A \mathbf h|}{|\mathbf h|} = 0.$$ In this case, the linear operator $A$ is denoted by $f'(\mathbf x)$ and is called the derivative of $f$ at $\mathbf x$.

Remarks.

The limit in the theorem is a limit in $\mathbb R^n$; in particular, one cannot choose the path for $\mathbf h$ to tend to zero !
Of course, the definition above relies on the fact that the limit can be zero for at most one such linear operator $A$; to prove this, assume there are two and verify that they must actually be the same.
If $f$ is differentiable at $\mathbf x$, then it is continuous at $\mathbf x$; this is sometimes useful to disprove differentiability.

Theorem (Chain rule). Let $E\subset \mathbb R^n$ be open, and $f:E\to \mathbb R^m$ be differentiable at $\mathbf x_0\in E$. Let also $V \subset \mathbb R^m$ be open with $f(E)\subset V$ and $g:V\to \mathbb R^k$ be differentiable at $f(\mathbf x_0)$. Then the map $g\circ f: E\to \mathbb R^k$ is differentiable at $\mathbf x_0$ and $$(g\circ f)'(\mathbf x_0) = g'(f(\mathbf x_0)) f'(\mathbf x_0).$$

Remarks.

The right-hand side if of course the linear operator obtained by composing $g'(f(\mathbf x_0))$ and $f'(\mathbf x_0)$, in this order!
Note that the operators on each side are in $L(\mathbb R^n, \mathbb R^m)$
The chain rule can be interpreteted in terms of the identity between the matrix $[(g\circ f)'(\mathbf x_0)'$ and the product of matrices $[g'(f(\mathbf x_0))] \times [f'(\mathbf x_0)]$

Definition (Partial derivatives). Let $f:E\to \mathbb R^m$ for an open $E\subset \mathbb R^n$. Let $(\mathbf e_1,\dots, \mathbf e_n)$ and $(\mathbf u_1,\dots, \mathbf u_m)$ be the standard bases of $\mathbb R^n$ and $\mathbb R^m$, respectively. Then the $j$th partial derivative $D_jf_i$ of the $i$th component $f_i:E\to \mathbb R$ of $f$ is defined as $$D_jf_i(\mathbf x) = \lim_{t\to 0} \frac{f_i(\mathbf x + t \mathbf e_j)-f(\mathbf x)}{t},$$ provided that the limit exists.

Remark.

Even if all the partial derivatives exist at a point $\mathbf x \in \mathbb R^n$, the function $f$ may fail to be differentiable: indeed, for the partial derivatives to exist, the function only needs to be nice on the $n$ orthogonal lines parallel to the $n$ axes that meet at the point $\mathbf x$; the function does not even need to be defined anywhere else!
If we write $\mathbf x=(x_1,x_2,\dots, x_n)$ and see $f(\mathbf x)$ as $f(x_1,\dots, x_n)$ then one usually writes $$D_j f_i(\mathbf x)=\frac{\partial f_i} {\partial x_j} (x_1,x_2,\dots, x_n).$$

Theorem. Let $E\subset \mathbb R^n$ be open, and $f:E\to \mathbb R^m$ be differentiable at $\mathbf x\in E$. Then the all the partial derivatives at $D_jf_i$ exist at the point $\mathbf x$ and we have, for all $j\in \{1,\dots, n\}$, $$ f'(\mathbf x) \mathbf e_j = \sum_{i=1}^m (D_jf_i)(\mathbf x) \mathbf u_i.$$

Remarks.

This says that the linear map $f'(\mathbf x)$, if it exists, is determined by all the partial derivatives;
More precisely if $f$ is differentiable, the matrix of $f'(x)$ can be written as $$[f'(x)]= \begin{pmatrix} D_1 f_1(x) & D_2f_1(x) & \dots & D_n f_1(x)\\ D_1 f_2(x) & D_2f_2(x) & \dots & D_n f_2(x)\\ \vdots & & & \vdots\\ D_1 f_m(x) & D_2 f_m(x) & \dots & D_n f_m(x) \end{pmatrix},$$ where the $j$th column is the vector of the derivatives of all the components of $f$ with respect to the $j$th component of $\mathbf x$.
In particular, the chain rule can be expressed in terms of a product of such matrices. With the assumptions in the theorem on the chain rule above: $$ [(g\circ f)'(\mathbf x)] = \begin{pmatrix} D_1 g_1(f(\mathbf x)) & \dots & D_m g_1(f(\mathbf x))\\ D_1 g_2(f(\mathbf x)) & \dots & D_m g_2(f(\mathbf x))\\ \vdots & & \vdots\\ D_1 g_k(f(\mathbf x)) & \dots & D_m g_k(f(\mathbf x))\\ \end{pmatrix} \cdot \begin{pmatrix} D_1 f_1 (\mathbf x) & \dots & D_n f_1(\mathbf x)\\ D_1 f_2 (\mathbf x) & \dots & D_n f_2(\mathbf x)\\ \vdots & & \vdots\\ D_1 f_m (\mathbf x) & \dots & D_n f_m(\mathbf x)\\ \end{pmatrix} $$
The matrix product above in turn implies another form for the chain rule: setting $\mathbf y=f(\mathbf x)$ for each $i\in \{1,\dots, k\}$, $$\frac{\partial (g\circ f)_i}{\partial x_j}(x_1,\dots, x_j) = \sum_{p = 1 }^m \frac{\partial g_i}{\partial y_p} (f(x_1,\dots, x_n)) \cdot \frac{\partial f_p}{\partial x_j} (x_1,\dots, x_n).$$ The meaning of the chain rule might then appear more clearly: changing $x_j$ a by $\Delta x_j$ makes $f_p(\mathbf x)$ change approximately by $\Delta y_p = \partial f_p / \partial x_j(\mathbf x) \cdot \Delta x_j$, for each $p\in \{1,\dots, m\}$; but for each $p$, a change in $y_p$ by $\Delta y_p$ in turn induce a change in $g(\mathbf y)$ by $\partial g_i / \partial y_p(\mathbf y) \cdot \Delta y_p$, and these changes are additive.

Definition (Continuously differentiable on $\mathbb R^n$). Suppose $E\subset \mathbb R^n$ is open, and $f: E\to \mathbb R^m$. The function $f$ is said to be continuously differentiable on $E$ if $f$ is differentiable on $E$ and the map $f': E\to L(\mathbb R^n,\mathbb R^m)$ is continuous.

Remarks.

The continuity of $f'$ is a map from $\mathbf R^n$ in $L(\mathbf R^n,\mathbf R^m)$; the metric $d(\cdot,\cdot)$ on $L(\mathbf R^n,\mathbf R^m)$ that one uses is the one induced by the operator norm $\| \cdot \|$ defined by $\|A\| = \sup\{ |A \mathbf v|: |v|\le 1 \}$, via $d(A,B)=\|A-B\|.$
Continuously differentiability is genuinely different from differentiability: there exist a function $f$ that is differentiable on an open $E$ containing $\mathbf x_0$ and $f'$ is not continuous at $\mathbf x_0$ (See Exercise 16 p 241).

Theorem. Let $E\subset \mathbb R^n$ be open. The function $f:E\to \mathbb R^m$ is continuously differentiable if and only if all the partial derivatives $D_jf_i$, $1\le i\le m$, $1\le j\le n$, of $f$ exist and are continuous on $E$.

Definition (Jacobian). If $E\subseteq \mathbb R^n$ is open and $f:E\to \mathbb R^n$ is differentiable at $\mathbf x\in E$, the Jacobian of $f$ at $x$, denoted by $J_f(x)$, is defined as the determinant of the linear operator $f'(x)\in L(\mathbb R^n)$.

Remarks

For the Jacobian of $f$ to make sense, it is crucial that $f$ be defined on a subset of $\mathbb R^n$ and takes values in $\mathbb R^n$, since the matrix $[f'(x)]$ should be a square matrix!
The notation $$J_f=\frac{\partial(f_1,f_2,\dots, f_n)}{\partial(x_1,\dots, x_n)}$$ is also useful.
The operator $f'(x)$ is invertible if and only if $J_f(x)\ne 0$.

Additional notions.

Directional derivative: for $f:E\subseteq \mathbb R^n\to \mathbb R$ and $\mathbf u$ unit vector in $\mathbb R^n$, one defines $D_{\mathbf u} f$ by $$\mathbf D_{\mathbf u}f(\mathbf x) = \lim_{t\to 0} \frac{f(\mathbf x+t \mathbf u)-f(\mathbf x)}{t},$$ when the limit exists.
If $f$ is differentiable then $D_{\mathbf u}f(x) = \sum_{i=1}^n u_i D_i f(x)$
If $f$ is differentiable and real valued, the gradient of $f$, denoted by $\operatorname{grad} f = \nabla f$, is defined as the $n$-vector of partial derivatives of $f$: $\nabla f(x) = \sum_{i=1}^n D_if (x) \mathbf e_i$
For $f:E\to \mathbb R$ differentiable $D_{\mathbf u}f = \nabla f(x) \cdot \mathbf u$; in particular, it follows that $\|D_{\mathbf u}f\| \le \| \nabla f\|$ and that $D_{\mathbf u} f$ is maximized when $\nabla f$ and $\mathbf u$ are parallel and in the same direction; so $\nabla f$ points in the direction in which $f$ ``changes most''
For $f:E\to \mathbb R$ of class $\mathscr C^1$, the level sets $\mathcal L(c)=\{\mathbf x: f(\mathbf x)=c\}$ are everywhere normal to the gradient $\nabla f$.

II. Inverse and implicit function theorems

Theorem (Inverse function theorem). Let $E\subset \mathbb R^n$ be open. Suppose that $f:E\to \mathbb R^n$ is of class $\mathscr C^1$, that $a\in E$ and $f'(a)$ is invertible. Then,

there exists open sets $U,V\subseteq \mathbb R^n$ such $a\in U$, $b=f(a)\in V$ such that $f$ is one-to-one on $U$ and $f(U)=V$;
if $g$ denotes the inverse of $f$ defined on $V$ by $g(f(x))=x$ for every $x\in U$, then $g$ is of class $\mathscr C^1$.

Remarks.

Of course, it is essential that $E\subset \mathbb R^n$ and that $f:E\to \mathbb R^n$, since if the operator $f'(\mathbf a)$ is not square, then it cannot be invertible !
The condition that $f'(a)$ be invertible is equivalent to $J_f(a)\ne 0$.
Note that $f'(a)$ being invertible is only a sufficient condition for $f$ to be one-to-one in an open set $U\ni a$; for instance, consider the function $f: t\mapsto t^3$ at the point $t=0$;
On the other hand, $f'(a)$ being invertible is a necessary condition for the inverse function $f^{-1}$ to be differentiable at $f(a)$ (See Homework 3).
Note also that the theorem only guarantees that if $J_{f}\ne 0$ on all of $E$, then $f$ is one-to-one in some open set around every point of $E$, but not that $f$ is one-to-one on $E$ (See Exercise 17 p 241).

Theorem (Implicit Function Theorem). Let $E\subseteq \mathbb R^{n+m}$ be open. Let $f:E\to \mathbb R^n$ be $\mathscr C^1$ on $E$ such that $f(\mathbf a, \mathbf b)=0$ for $(\mathbf a,\mathbf b)\in E$. Write $A=f'(\mathbf a,\mathbf b)$ and suppose that $A_x\in L(\mathbf R^n, \mathbf R^n)$ is invertible. Then there exists open sets $U\subseteq \mathbf R^{n+m}$ and $W\subseteq R^m$ with $(\mathbf a,\mathbf b)\in U$ and $\mathbf b\in W$ satisfying the following property:

for every $y\in W$ there exists a unique $x\in \mathbf R^n$ such that $(x,y)\in U$ and $f(x,y)=0$;
if $x=g(y)$ is associated to $y$ as above, then $g:W\to \mathbb R^n$ is $\mathscr C^1$ and $g'(b)=-A_x^{-1} A_y \in L(\mathbb R^m,\mathbb R^n)$.

Remarks.

The inverse function theorem is a special case of the implicit function theorem: taking $f: \mathbb R^{n+n}\to \mathbb R^n$ given by $f(\mathbf x, \mathbf y)=g(\mathbf x)-\mathbf y$, finding $\mathbf x$ in terms of $\mathbf y$ under the constraint that $g(\mathbf x)-\mathbf y=0$ amounts to inverting $g$.
One may think of the implicit function theorem as solving for a vector-valued unknown $\mathbf x$ given a certain set of constraints $f(\mathbf x,\mathbf y)=0$;
It is crucial that the number of real-valued unknown, that is the number of components of $\mathbf x$, be the same as the number of real-valued constraints, that is the number of components of $f(\mathbf x,\mathbf y)$.
As the Inverse Function Theorem, the Implicit Function Theorem is only one implication: the fact that $A_x$ is not invertible does not imply that there does not exist the open sets $U$ and $W$ satisfying the properties.
However, in the special case when $f(\mathbf x,\mathbf y)=A_x \mathbf x + F(\mathbf y)$ for a linear operator $A_x\in L(\mathbf R^n,\mathbf R^n)$, then if $det(A_x)=0$ then one can never solve for $\mathbf x$: either there is no solution, or if there is one then there are infinitely many.

III. Around exchanges of limits

1. Higher order derivatives

Definition. Suppose $E\subseteq \mathbb R^n$ and $f:E\to \mathbb R$. If $D_i f$ exists on $E$ for some $i\in \{1,\dots, n\}$ and if $D_i f$ is such that $D_j (D_i f)$ also exist, we write $D_{ji}f=D_j(D_i f)$.

Remarks.

$D_{ji}f(\mathbf x)$ is called a second order partial derivative.
If the function $f$ is $\mathbb R^m$-valued, then we define the second order partial derivatives for its components.
With the notation $f(x_1,\dots, x_n)$ we write $$D_{ji}f(\mathbf x) = D_j(D_i f)(\mathbf x) = \frac{\partial }{\partial x_j}\frac{\partial f}{\partial x_i} (x_1,\dots, x_n)= \frac{\partial^2 f}{\partial x_j \partial x_i} (x_1,\dots, x_n).$$

Theorem. Let $E\subseteq \mathbb R^2$ be open. Suppose $f:E\to \mathbb R$ is such that $D_1 f$, $D_{21}f$ and $D_2 f$ exists at every point of $E$, and $D_{21}f$ is continuous at $(a,b)\in E$. Then, $D_{12}f$ exists at $(a,b)$ $$D_{12}f(a,b)=D_{21}f(a,b).$$

Corollary. If $f\in \mathscr C^2(E,\mathbb R)$, then $D_{21}f=D_{12}f$.

Remark.

The corollary only provides a sufficient sufficient, but a very useful one
The equality of the mixed second derivatives for $\mathscr C^2$ functions is crucial to many aspects of differential forms: this is precisely why one asks for (1) $\omega$ to be $\mathscr C^2$ for $d^2 \omega =0$, (2) for $T\in \mathscr C^2$ in pull-back identity $(d \omega)_T = d(\omega_T)$, and (3) for the $k$-chain $\Psi$ to be $\mathscr C^2$ in Stoke's theorem.

2. Derivatives of parametric integrals

Theorem. Suppose that

$\varphi(x,t)$ is defined for $(x,t)\in [a,b]\times [c,d]$;
$\varphi(\cdot, t):[a,b]\to \mathbb R$ is Riemann integrable for every $t\in [a,b]$;
$s\in(c,d)$ and for every $\epsilon>0$ there exists a $\delta>0$ such that if $|t-s|<\delta$ then $$ \sup_{x\in [a,b]}|D_2\varphi (x,t) - D_2 \varphi (x,s)| < \epsilon;$$

Define the function $f$ on $[c,d]$ by $$f(t)=\int_a^b \varphi(x,t) dx.$$ Then $(D_2 \varphi)(\cdot, s)$ is Riemann integrable, $f$ is differentiable at $s$ and $$f'(s) = \int_a^b ( D_2 \varphi )(x,s) dx.$$

Remarks.

The essential condition is 3., which gives the uniform convergence that one use to exchange the limits defining the integral and the partial derivative with respect to $t$.
If $D_2 \varphi$ is continuous on $[a,b]\times [c,d]$, then for every $x\in [a,b]$ and $s\in (c,d)$, $D_2\varphi (x,t)\to D_2\varphi(x,s)$ as $t\to s$; since $[a,b]$ is compact the convergence is actually uniform for $x\in [a,b]$ and the condition in 3. holds.
Stokes' theorem may also be seen as a theorem about exchanging limits

Typical exercises you should be able to solve.

Given a function $f$ on $\mathbb R^2$: determine if $f$ is continuous; determine existence and compute partial derivatives (by calculus when it is possible, and by limits when it is necessary); identify the only canditate linear operator in the definition of differentiability; prove that $f$ is differentiable, or prove that it is not.
Compute derivatives using the chain rule: this might be to verify that certain functions defined by compositions satisfy certain differential equations.
Given a function $f$ and a point $\mathbf a$, determine if $f$ is one-to-one in a neighborhood of $\mathbf a$ using the inverse function theorem; Identify the points at which $f$ might not be locally one-to-one (that is for which there exist no open neighborhood on which $f$ is one-to-one), again using the inverse function theorem; among such candidates, find for which ones $f$ is indeed not locally one-to-one (here, this has to be specific to $f$, and cannot rely on the inverse function theorem); compute the derivative of an inverse function.
Given a function $f:\mathbb R^{n+m}\to \mathbb R^n$ and a point $(\mathbf a,\mathbf b)$, determine whether one can solve for $\mathbf x$ in terms of $\mathbf y$ in a neighborhood of $(\mathbf a,\mathbf b)$ using the implicit function theorem; find candidate points $(\mathbf a,\mathbf b)$ at which one may potentially not be able to solve $\mathbf x$ in terms of $\mathbf y$ around $(\mathbf a,\mathbf b)$; among these candidate points, determine those at which one can really not solve for $\mathbf x$ in terms of $\mathbf y$ (this amounts to finding the solutions somewhat explicitly, and looking at limits to see if multiple curves reach the given points)
Given a ''surface-like'' set of points $(\mathbf x,\mathbf y,\mathbf z)$ in $\mathbb R^3$ defined by a polynomial constraint on $\mathbf x$, $\mathbf y$ and $\mathbf z$: determine if the set has a tangent plane at a given point; determine an equation for the plane; find subsets of points with certain simple properties (such as tangent plane parallel to something, etc.)