Let’s start with a function \(f\) of single variable \(x\). The derivative of \(f\) at the point \(x\) is \[\lim_{h\rightarrow 0}\frac{f(x+h)-f(x)}{h}=f'(x)\]

if this limit exists. Let’s define \(\phi(h)\) as

\[\phi(h)=\frac{f(x+h)-f(x)}{h}-f'(x).\]

Although \(\phi(h)\) is not defined at \(h=0\), but (In general, suppose for a function \(g(x)\), \(\lim_{x\rightarrow a}g(x)=L\). Because \(\lim_{x\rightarrow a}(g(x)-L)=0\), if we define \(\phi(x)\) as \(\phi(x)=g(x)-L\)), then

\[\lim_{h\rightarrow 0} \phi(h)=0.\]

We can write (if \(h\neq 0\)):

\[\phi(h)=\frac{f(x+h)-f(x)}{h}-f'(x)\Rightarrow f(x+h)-f(x)=f'(x) h+h \phi(h)\]

or

\[\begin{aligned} f(x+h)=f(x)+f'(x) h+h\phi(h)\end{aligned}\]

Let’s define \(\varepsilon(h)\) by:

\[\varepsilon(h)=\left\{\begin{array}{ll} \phi(h) & \text{if } h>0\\ -\phi(h) & \text{if } h<0 \end{array} \right.\]

Therefore, if \(f(x)\) is differentiable, there exists a function \(\varepsilon(h)\) such that

\[f(x+h)=f(x)+f'(x)h+|h|\varepsilon(h),\]

and \(\varepsilon(h)\rightarrow0\) as \(h\rightarrow0\).

Recall that \(f(x)+f'(x)h\) is the linearization of \(f\) at the point \(x\). Therefore \(|h|\varepsilon(h)\) is the error in this approximation. Eq. (i) means that if \(f'(x)\) exists, the function \(f\) can be approximated by its linear approximation and the growth of the error is nothing compared to \(|h|\), that is *error*\(|h|\rightarrow 0\) as \(h\to0\) [1], where \(|h|\) is the distance of the point \((x+h)\) from the point \(x\).

Conversely, suppose there exists a number \(a\) and a function \(\varepsilon(h)\) such that

\[\label{Eq:Diff-1D} f(x+h)=f(x)+ah+|h|\varepsilon(h), \quad \text{and}\quad \lim_{h\to 0}\varepsilon(h)=0.\tag{i}\]

Dividing both sides by \(h\neq 0\):

\[\frac{f(x+h)-f(x)}{h}=a+\frac{|h|}{h}\varepsilon(h)\]

and taking the limit as \(h\to 0\):

\[\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}=a+\underbrace{\lim_{h\to 0} \frac{|h|}{h}\varepsilon(h)}_{=0} \Rightarrow \lim_{h\to 0}\frac{f(x+h)-f(x)}{h}=a.\]

This means the derivative of \(f\) at the point \(x\) exists and is equal to \(a\):

\[f'(x)=a.\]

Therefore, we can define the differentiability of a function \(f\) at the point \(x\) as the existence of a number \(a\) and a function \(\varepsilon\) that satisfy Eq. (i). We can extend the definition for functions of two or more variables.

Let \(z=f(x,y)\). We say \(f\) is differentiable at the point \((x,y)\) if there exist two numbers \(a\) and \(b\), and a function \(\varepsilon(h,k)\) such that:

\[f(x+h,y+k)=f(x,y)+ah+bk+\underbrace{\left|(h,k)\right|}_{=\sqrt{h^2+h^2}}\varepsilon(h,k)\]

and (Note that the magnitude (or the absolute value) of a vector \((h,k)\) is: \(|(h,k)|=\sqrt{h^2+k^2}\));

\[\lim_{(h,k)\rightarrow (0,0)}\varepsilon(h,k)=0.\]

If such an approximation is valid, Let \(k=0\), divide both sides by \(h\neq 0\) and take the limit \(h\to 0\), then we will have:

\[\underbrace{\lim_{h\to 0} \frac{f(x+h,y)-f(x,y)}{h}}_{=f_x(x,y)}=a+\underbrace{\lim_{h\to 0}\frac{\sqrt{h^2}}{h}\varepsilon(h,0)}_{=0}\Rightarrow f_x(x,y)=a.\]

Similarly, we can show: \[f_y(x,y)=b\]

**Definition 1.**We say a function \(f\) is

**differentiable**at the point \((x,y)\) if its partial derivatives \(f_x(x,y)\) and \(f_y(x,y)\) exist and there exists a function \(\varepsilon\) such that:

\[f(x+h,y+k)=f(x,y)+f_x(x,y)h+f_y(x,y)k+\sqrt{h^2+k^2}\varepsilon(h,k)\]

and

\[\lim_{(h,k)\rightarrow (0,0)}\varepsilon(h,k)=0.\]

The above definition means that if \(f(x,y)\) is differentiable, it can be approximated by its linearization and the growth of the error in this approximation, \(\sqrt{h^2+k^2}\varepsilon(h,k)\), is nothing compared to the growth of \(\rho\), where \(\rho=\sqrt{h^2+k^2}\) is the distance of the point \((x+h,y+k)\) from the point \((x,y)\).

From the definition of differentiability it is (Take the limit \((h,k)\to(0,0)\) from both sides, and remember that \(f_x(x,y)\) and \(f_y(x,y)\) are the values of \(f_x\) and \(f_y\) at \((x,y)\) not two functions of \(h\) or \(k\), so they are just two constants)clear; that if a function is differentiable at \((x,y)\) then it is continuous there. Therefore, **if a function is not continuous, it cannot be differentiable**.

differentiability \(\Rightarrow\) continuity

Also according to Definition 1, if a function is differentiable, its first partial derivatives exist. Therefore, **if the first partial derivatives of a function do not exist at a point, the function is not differentiable at that point.**

differentiability \(\Rightarrow\) existence of first partial derivatives

Using Definition 1 to verify whether a function is differentiable is often hard. Here we introduce a theorem that can be applied to most functions to show that they are differentiable.

**Theorem 1. **If the first partial derivatives of a function \(f\) exist in some neighborhood of \(\mathbf{x}_0\) and are continuous at \(\mathbf{x}_0\), then \(f\) is differentiable at \(\mathbf{x}_0\).

A function that has continuous first partial derivatives is called continuously differentiable or a function of class \(C^1\). A function that not only its first partial derivatives are continuous but also all its second partial derivatives are continuous is called twice continuously differentiable or a function of class \(C^2\). In the same manner we can define function of class \(C^3\), \(C^4\), and so on. A function that has continuous partial derivatives of all orders is called a \(C^\infty\) function. A function that is continuous is referred to a function of class \(C^0\).

At the end of this section, we can extend the concept of differentiability of functions of several variables:

**Definition 2. **We say a function \(f:U\subseteq \mathbb{R}^n\to \mathbb{R}\) is differentiable at \((x_1,\cdots,x_n)\), if its first partial derivatives \(\frac{\partial f}{\partial x_i}(x_1,\cdots,x_n)\) (for \(i=1,\cdots,n\)) exist and there exists a function \(\varepsilon(h_1,\cdots,h_n)\) such that:

\[f(x_1+h_1,\cdots,x_n+h_n)=\]

\[f(x_1,\cdots,x_n)+\frac{\partial f}{\partial x_1}. h_1+\cdots+\frac{\partial f}{\partial x_n}.h_n+\sqrt{h_1^2+\cdots+h_n^2}\varepsilon(h_1,\cdots,h_n)\]

and

\[\lim_{(h_1,\cdots,h_n)\to(0,\cdots,0)} \varepsilon(h_1,\cdots,h_n)=0.\]

The partial derivatives are evaluated at \((x_1,\cdots,x_n)\).

[1] Mathematically, this is often written as \(error=o(|h|)\)