I. Essential Notions and Concepts
Definition (Differentiability of functions on $\mathbb R^n$).
Suppose $E\subset \mathbb R^n$ is open, and $f: E\to \mathbb R^m$, and $\mathbf x\in E$.
The map $f$ is said to be differentiable at $\mathbf x$ if there exists a linear transformation
$A\in L(\mathbb R^n, \mathbb R^m)$ such that
$$\lim_{\mathbf h\to 0} \frac{|f(\mathbf x+\mathbf h)-f(\mathbf x)-A \mathbf h|}{|\mathbf h|} = 0.$$
In this case, the linear operator $A$ is denoted by $f'(\mathbf x)$ and is called the derivative of $f$ at $\mathbf x$.
Remarks.
- The limit in the theorem is a limit in $\mathbb R^n$; in particular, one cannot choose the path for $\mathbf h$ to tend to zero !
- Of course, the definition above relies on the fact that the limit can be zero for at most one such linear operator $A$; to prove this, assume there are two and verify that they must actually be the same.
- If $f$ is differentiable at $\mathbf x$, then it is continuous at $\mathbf x$; this is sometimes useful to disprove differentiability.
Theorem (Chain rule). Let $E\subset \mathbb R^n$ be open, and $f:E\to \mathbb R^m$ be differentiable at $\mathbf x_0\in E$. Let also $V \subset \mathbb R^m$ be open with $f(E)\subset V$ and $g:V\to \mathbb R^k$ be differentiable at $f(\mathbf x_0)$. Then the map $g\circ f: E\to \mathbb R^k$ is differentiable at $\mathbf x_0$ and
$$(g\circ f)'(\mathbf x_0) = g'(f(\mathbf x_0)) f'(\mathbf x_0).$$
Remarks.
- The right-hand side if of course the linear operator obtained by composing $g'(f(\mathbf x_0))$ and $f'(\mathbf x_0)$, in this order!
- Note that the operators on each side are in $L(\mathbb R^n, \mathbb R^m)$
- The chain rule can be interpreteted in terms of the identity between the matrix $[(g\circ f)'(\mathbf x_0)'$ and the product of matrices $[g'(f(\mathbf x_0))] \times [f'(\mathbf x_0)]$
Definition (Partial derivatives). Let $f:E\to \mathbb R^m$ for an open $E\subset \mathbb R^n$. Let $(\mathbf e_1,\dots, \mathbf e_n)$ and $(\mathbf u_1,\dots, \mathbf u_m)$ be the standard bases of $\mathbb R^n$ and $\mathbb R^m$, respectively. Then the $j$th partial derivative $D_jf_i$ of the $i$th component $f_i:E\to \mathbb R$ of $f$ is defined as
$$D_jf_i(\mathbf x) = \lim_{t\to 0} \frac{f_i(\mathbf x + t \mathbf e_j)-f(\mathbf x)}{t},$$
provided that the limit exists.
Remark.
- Even if all the partial derivatives exist at a point $\mathbf x \in \mathbb R^n$, the function $f$ may fail to be differentiable: indeed, for the partial derivatives to exist, the function only needs to be nice on the $n$ orthogonal lines parallel to the $n$ axes that meet at the point $\mathbf x$; the function does not even need to be defined anywhere else!
- If we write $\mathbf x=(x_1,x_2,\dots, x_n)$ and see $f(\mathbf x)$ as $f(x_1,\dots, x_n)$ then one usually writes
$$D_j f_i(\mathbf x)=\frac{\partial f_i} {\partial x_j} (x_1,x_2,\dots, x_n).$$
Theorem. Let $E\subset \mathbb R^n$ be open, and $f:E\to \mathbb R^m$ be differentiable at $\mathbf x\in E$. Then the all the partial derivatives at $D_jf_i$ exist at the point $\mathbf x$ and we have, for all $j\in \{1,\dots, n\}$,
$$ f'(\mathbf x) \mathbf e_j = \sum_{i=1}^m (D_jf_i)(\mathbf x) \mathbf u_i.$$
Remarks.
- This says that the linear map $f'(\mathbf x)$, if it exists, is determined by all the partial derivatives;
- More precisely if $f$ is differentiable, the matrix of $f'(x)$ can be written as
$$[f'(x)]=
\begin{pmatrix}
D_1 f_1(x) & D_2f_1(x) & \dots & D_n f_1(x)\\
D_1 f_2(x) & D_2f_2(x) & \dots & D_n f_2(x)\\
\vdots & & & \vdots\\
D_1 f_m(x) & D_2 f_m(x) & \dots & D_n f_m(x)
\end{pmatrix},$$
where the $j$th column is the vector of the derivatives of all the components of $f$ with respect to the $j$th component of $\mathbf x$.
- In particular, the chain rule can be expressed in terms of a product of such matrices. With the assumptions in the theorem on the chain rule above:
$$
[(g\circ f)'(\mathbf x)]
= \begin{pmatrix}
D_1 g_1(f(\mathbf x)) & \dots & D_m g_1(f(\mathbf x))\\
D_1 g_2(f(\mathbf x)) & \dots & D_m g_2(f(\mathbf x))\\
\vdots & & \vdots\\
D_1 g_k(f(\mathbf x)) & \dots & D_m g_k(f(\mathbf x))\\
\end{pmatrix}
\cdot
\begin{pmatrix}
D_1 f_1 (\mathbf x) & \dots & D_n f_1(\mathbf x)\\
D_1 f_2 (\mathbf x) & \dots & D_n f_2(\mathbf x)\\
\vdots & & \vdots\\
D_1 f_m (\mathbf x) & \dots & D_n f_m(\mathbf x)\\
\end{pmatrix}
$$
- The matrix product above in turn implies another form for the chain rule: setting $\mathbf y=f(\mathbf x)$ for each $i\in \{1,\dots, k\}$,
$$\frac{\partial (g\circ f)_i}{\partial x_j}(x_1,\dots, x_j) = \sum_{p = 1 }^m \frac{\partial g_i}{\partial y_p} (f(x_1,\dots, x_n)) \cdot \frac{\partial f_p}{\partial x_j} (x_1,\dots, x_n).$$
The meaning of the chain rule might then appear more clearly: changing $x_j$ a by $\Delta x_j$ makes $f_p(\mathbf x)$ change approximately by $\Delta y_p = \partial f_p / \partial x_j(\mathbf x) \cdot \Delta x_j$, for each $p\in \{1,\dots, m\}$; but for each $p$, a change in $y_p$ by $\Delta y_p$ in turn induce a change in $g(\mathbf y)$ by $\partial g_i / \partial y_p(\mathbf y) \cdot \Delta y_p$,
and these changes are additive.
Definition (Continuously differentiable on $\mathbb R^n$). Suppose $E\subset \mathbb R^n$ is open, and $f: E\to \mathbb R^m$.
The function $f$ is said to be continuously differentiable on $E$ if $f$ is differentiable on $E$ and the map $f': E\to L(\mathbb R^n,\mathbb R^m)$ is continuous.
Remarks.
- The continuity of $f'$ is a map from $\mathbf R^n$ in $L(\mathbf R^n,\mathbf R^m)$; the metric $d(\cdot,\cdot)$ on $L(\mathbf R^n,\mathbf R^m)$ that one uses is the one induced by the operator norm $\| \cdot \|$ defined by $\|A\| = \sup\{ |A \mathbf v|: |v|\le 1 \}$, via $d(A,B)=\|A-B\|.$
- Continuously differentiability is genuinely different from differentiability: there exist a function $f$ that is differentiable on an open $E$ containing $\mathbf x_0$ and $f'$ is not continuous at $\mathbf x_0$ (See Exercise 16 p 241).
Theorem. Let $E\subset \mathbb R^n$ be open. The function $f:E\to \mathbb R^m$ is continuously differentiable if and only if all the partial derivatives $D_jf_i$, $1\le i\le m$, $1\le j\le n$, of $f$ exist
and are continuous on $E$.
Definition (Jacobian). If $E\subseteq \mathbb R^n$ is open and $f:E\to \mathbb R^n$ is differentiable at $\mathbf x\in E$, the Jacobian of $f$ at $x$, denoted by $J_f(x)$, is defined as the determinant of the linear operator $f'(x)\in L(\mathbb R^n)$.
Remarks
- For the Jacobian of $f$ to make sense, it is crucial that $f$ be defined on a subset of $\mathbb R^n$ and takes values in $\mathbb R^n$, since the matrix $[f'(x)]$ should be a square matrix!
- The notation
$$J_f=\frac{\partial(f_1,f_2,\dots, f_n)}{\partial(x_1,\dots, x_n)}$$
is also useful.
- The operator $f'(x)$ is invertible if and only if $J_f(x)\ne 0$.
Additional notions.
- Directional derivative: for $f:E\subseteq \mathbb R^n\to \mathbb R$ and $\mathbf u$ unit vector in $\mathbb R^n$, one defines $D_{\mathbf u} f$ by
$$\mathbf D_{\mathbf u}f(\mathbf x) = \lim_{t\to 0} \frac{f(\mathbf x+t \mathbf u)-f(\mathbf x)}{t},$$
when the limit exists.
- If $f$ is differentiable then $D_{\mathbf u}f(x) = \sum_{i=1}^n u_i D_i f(x)$
- If $f$ is differentiable and real valued, the gradient of $f$, denoted by $\operatorname{grad} f = \nabla f$, is defined as the $n$-vector of partial derivatives of $f$: $\nabla f(x) = \sum_{i=1}^n D_if (x) \mathbf e_i$
- For $f:E\to \mathbb R$ differentiable $D_{\mathbf u}f = \nabla f(x) \cdot \mathbf u$; in particular, it follows that $\|D_{\mathbf u}f\| \le \| \nabla f\|$ and that $D_{\mathbf u} f$ is maximized when $\nabla f$ and $\mathbf u$ are parallel and in the same direction; so $\nabla f$ points in the direction in which $f$ ``changes most''
- For $f:E\to \mathbb R$ of class $\mathscr C^1$, the level sets $\mathcal L(c)=\{\mathbf x: f(\mathbf x)=c\}$ are everywhere normal to the gradient $\nabla f$.