Table of Contents
Constrained Extrema
In practice, we wish to optimize a function considering some existing constraints. In economics and engineering, the constrain may be due to limited funds, materials, or energy. If we wish to find the distance from a point \(P=(x_0,y_0)\) to a line \(ax+by=c\), we should find the minimum value of \(d(x,y)=\sqrt{(x-x_0)^2+(y-y_0)^2}\) while \((x,y)\) satisfies \(ax+by=c\). Suppose \(T(x,y,z)\) represents the temperature in space, and we want to find the maximum temperature on a surface given by \(g(x,y,z)=0\). When the constraint (also called the side condition) equation is given, we can solve it for one of the variables, say \(z=\phi(x,y)\), and replace it in the function \(T\). Then the problem would reduce to finding the extremum value of the function \(T(x,y,\phi(x,y))\), which now depends only on two independent variables. We have already applied this method in this example, where we optimized the value of the function on the boundary of a region. To practice how to use this method, let’s consider the following example.
Lagrange Multipliers
When the constraint is given implicitly by \(g(x,y,z)=c\), it is not always possible or easy to solve the constraint equation for one of the variables (express \(x, y\) or \(z\) as a function of the remaining variables). The problem can be more complicated when there is more than one constraint. In such cases, an alternative procedure is a method called Lagrange multipliers.
To explain this method, let’s start with an example. Suppose we want to find the shortest and longest distance between the point \((1,-1)\) and the curve \(C\)
\[g(x,y)=\frac{(x-y-2)^2}{18}+\frac{(x+y)^2}{32}=1.\]
The distance between a point \((x,y)\) and \((1,-1)\) is given by \[f(x,y)=\sqrt{(x-1)^2+(y+1)^2}.\] So we want to maximize and minimize \(f(x,y)\) subject to \(g(x,y)=1\). First let’s sketch the curve \(C\) and some level curves of \(f\) (Fig. 1).
To extremize \(f(x,y)\) subject to \(g(x,y)=1\), we have to find the largest and smallest value of \(k\) such that the level curve \(f(x,y)=k\) intersects \(g(x,y)=1\). Among the level curves that intersect \(g(x,y)=1\), the minimum value of \(f(x,y)\) occurs at the points \(P\) and \(Q\) where \(f(x,y)\) has a value of 3 (see Fig. 1). At these points, the constraint curve \(g(x,y)=1\) and the level curve \(f(x,y)=3\) just touch each other; in other words, \(f=3\) and \(g=1\) have a common tangent line at \(P\) and a common tangent at \(Q\).
Note that \(g(x,y)=1\) is the level curve of \(z=g(x,y)\). Because at each point \(\overrightarrow{\nabla} f\) is perpendicular to the level curves of \(f\) and similarly\(\overrightarrow{\nabla} g\) is perpendicular to the level curves of \(g\), a common tangent at \(P\) means that \(\overrightarrow{\nabla} f(P)\) and \(\overrightarrow{\nabla} g(P)\) are parallel. That is, there is a number \(\lambda_1\) such that
\[\overrightarrow{\nabla} f(P)=\lambda_1 \overrightarrow{\nabla} g(P).\]
Similarly there is a number \(\lambda_1\) such that
\[\overrightarrow{\nabla} f(Q)=\lambda_1\overrightarrow{\nabla} g(Q).\]
We also observe that the maximum value of \(f(x,y)\) subject to \(g(x,y)=1\) occurs where the constraint curve and a level curve (here \(f=4\)) touch each other (In Fig. 1, they are denoted by\(A\) and \(B\)). Thus \[\overrightarrow{\nabla} f(A)=\mu_1\overrightarrow{\nabla} g(A),\] and \[\overrightarrow{\nabla} f(B)=\mu_2 \overrightarrow{\nabla} g(B),\] for some \(\mu_1\) and \(\mu_2\). Therefore, to find the maximum or minimum of \(f(x,y)\) subject to the constraint \(g(x,y)=1\) , we look for a point \(\textbf{x}_0\) such that
\[\overrightarrow{\nabla} f(\textbf{x}_0)=\lambda \overrightarrow{\nabla} g(\textbf{x}_0)\]
for some \(\lambda\). This is the method of Lagrange multiplier. But why this is true?
Suppose the constraint curve \(C\) is parameterized by some functions:1 \[x=X(t),\quad y=Y(t)\] If, in the equation of \(f\), \(x\) and \(y\) are replaced by \(X(t)\) and \(Y(t)\), then the distance between \((1,-1)\) and points on \(C\) becomes a function of \(t\) \[F(t)=f(x(t),y(t)).\] Therefore, the extreme values of the distance occur where \(F'(t)=0\). From the chain rule we know
\[F'(t)=\frac{dF}{dt}=\frac{\partial f}{\partial x}\frac{dX}{dt}+\frac{\partial f}{\partial y}\frac{dY}{dx}\]
We can write the equation \(F'(t)=0\) as
\[\begin{align} \frac{\partial R}{\partial x}\frac{dX}{dt}+\frac{\partial R}{\partial y}\frac{dY}{dx}&=0\\ \left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\boldsymbol{\cdot}\left(\frac{dX}{dt},\frac{dY}{dt}\right)&=0\end{align}\]
This means at the extreme points the gradient vector \((f_x,f_y)\) is perpendicular to \((X'(t),Y'(t))\). Recall that \((X'(t),Y'(t))\) is tangent to the curve \(C\).
On the other hand \(C\) is a level curve of \(g\). Therefore, at each point \(\overrightarrow{\nabla} g\) is perpendicular to \(C\). Therefore at the extreme points, \(\overrightarrow{\nabla} g\) and \(\overrightarrow{\nabla} f\) are parallel. The simplest version of the method of Lagrange multipliers is as follows.
Theorem 1. Suppose \(U\subseteq \mathbb{R}^n\) is an open set and \(f:U\to\mathbb{R}\) and \(g:U\to\mathbb{R}\) are two continuously differentiable functions2. Let \(S=\left\{\mathbf{x}\in U|\ g(\mathbf{x})=c\right\}\) be the level set for \(g\) with value \(c\). If \(f |S\) denoting “\(f\) restricted to \(S\),” has a relative extremum on \(S\) at \(\mathbf{x}_0\in U\), and \(\overrightarrow{\nabla} g(\mathbf{x}_0)\neq\mathbf{0}\), then there exists a real number \(\lambda\) such that
\[\overrightarrow{\nabla} f(\mathbf{x}_0)=\lambda \overrightarrow{\nabla} g(\mathbf{x}_0).\]
Read the proof
Hide the proof
Let \(\mathbf{r}:I\subseteq\mathbb{R}\to\mathbb{R}^n\) be a differentiable curve on the level set \(S\) such that \(\mathbf{r}(t_0)=\mathbf{x}_0\), \(\mathbf{r}'(t_0)\neq \mathbf{0}\), and \(\mathbf{r}(t)\in S\) for every \(t\in I\). Then \(g(\mathbf{r}(t))=c\), so the chain rule gives
\[\overrightarrow{\nabla} g(\underbrace{\mathbf{r}(t_0)}_{=\mathbf{x}_0})\boldsymbol{\cdot}\mathbf{r}'(t_0)=0.\]
Because \(f|S\) attains a relative maximum or minimum at \(\mathbf{x}_0\), the function \(h=f\circ\mathbf{r}:I\to\mathbb{R}\) attains a relative maximum or minimum at \(t_0\). Hence \(h'(t_0)=0\) and according to the chain rule we have:
\[h'(t_0)=\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{r}'(t_0)=0.\]
Thus the vectors \(\overrightarrow{\nabla} f(\mathbf{x}_0)\) and \(\overrightarrow{\nabla} g(\mathbf{x}_0)\neq \mathbf{0}\) are both normal to the nonzero vector \(\mathbf{r}'(t_0)\) and are therefore parallel; that is, \(\overrightarrow{\nabla} f(\mathbf{x}_0)=\lambda \overrightarrow{\nabla} g(\mathbf{x}_0)\) for some \(\lambda\). \(\blacksquare\)
- The number \(\lambda\) in the above theorem is called a Lagrange multiplier.
- \(\lambda\) might be zero.
Note that to find the extremum of \(f|S\), we have \(n+1\) unknowns (\(n\) components of \(\mathbf{x}_0\) and \(\lambda\)) and \(n+1\) equations:
\[\label{Eq:LagrangeEQ} \left.\begin{align} \frac{\partial f}{\partial x_1}(\mathbf{x}_0)&=\lambda \frac{\partial g}{\partial x_1}(\mathbf{x}_0)\\ &\vdots\\ \frac{\partial f}{\partial x_n}(\mathbf{x}_0)&=\frac{\partial g}{\partial x_n}(\mathbf{x}_0)\\ g(\mathbf{x}_0)&=c \end{align}\right\} \quad (n+1) \text{ equations} \tag{i} \]
If a function \(f\) subject to a constraint \(g(\mathbf{x})=c\) attains its extreme value at a point \(\mathbf{x}_0\), then \(\mathbf{x}_0\) is one of the four types of the point:
- \(\mathbf{x}_0\) is a point where \(\overrightarrow{\nabla} f(\mathbf{x}_0)=\lambda \overrightarrow{\nabla} g(\mathbf{x}_0)\),
- \(\mathbf{x}_0\) is a point where \(\overrightarrow{\nabla} g(\mathbf{x}_0)=\mathbf{0}\),
- \(\mathbf{x}_0\) is a rough point of \(f\) or \(g\) where \(\overrightarrow{\nabla} f(\mathbf{x}_0)\) or \(\overrightarrow{\nabla} g(\mathbf{x}_0)\) does not exist, or
- \(\mathbf{x}_0\) is on the boundary of the level set \(g(\mathbf{x})=c\).
Multiple Constraints
The method of Lagrange multipliers extends to the case of multiple constraints: say to \(f(x,y,z)\) subject to two constraints
\[\label{Eq:Lagrange_2Constraint} g_1(x,y,z)=c_1,\quad \text{and} \quad g_2(x,y,z)=c_2.\tag{ii}\]
Geometrically this means we seek the extreme values of \(f(x,y,z)\) on a curve \(C\) formed by the intersection of two level surfaces \(g_1(x,y,z)=c_1\) and \(g_2(x,y,z)=c_2\) (Fig. 4).
Suppose \(f\) subject to these constraints, \(f|C\), has an extremum at \(P=(x_0,y_0,z_0)\), and the functions \(f, g_1\), and \(g_2\) have continuous first partial derivatives near \(P\). Analogous to the case of one constraint, we can argue that \(\overrightarrow{\nabla} f\) is perpendicular to \(C\) at \(P\). Additionally because the gradient is perpendicular to the level surface, both \(\overrightarrow{\nabla} g_1\) and \(\overrightarrow{\nabla} g_2\) are perpendicular to \(C\). If \(\overrightarrow{\nabla} g_1(x_0,y_0,z_0)\) and \(\overrightarrow{\nabla} g_2(x_0,y_0,z_0)\) are neither zero vectors nor collinear vectors, then \(\overrightarrow{\nabla} f(x_0,y_0,z_0)\) lie in a plane spanned by the two vectors \(\overrightarrow{\nabla} g_1(x_0,y_0,z_0)\) and \(\overrightarrow{\nabla} g_2(x_0,y_0,z_0)\) and hence can be expressed as a linear combination of these two vectors, say:
\[\begin{align} \label{Eq:Lagrange_2Multipliers} \bbox[#F2F2F2,5px,border:2px solid black]{\overrightarrow{\nabla} f(x_0,y_0,z_0)=\lambda_1 \overrightarrow{\nabla} g_1(x_0,y_0,z_0)+\lambda_2 \overrightarrow{\nabla} g_2(x_0,y_0,z_0)}\tag{iii}\end{align}\]
In this method, Equations ii and iii must be solved simultaneously for five unknowns \(x_0,y_0,z_0,\lambda_1\) and \(\lambda_2\):
\[\left\{\begin{array}{l} \dfrac{\partial f}{\partial x}(x_0,y_0,z_0)=\lambda_1 \dfrac{\partial g_1}{\partial x}(x_0,y_0,z_0)+\lambda_2 \dfrac{\partial g_2}{\partial x}(x_0,y_0,z_0)\\ \\ \dfrac{\partial f}{\partial y}(x_0,y_0,z_0)=\lambda_1 \dfrac{\partial g_1}{\partial y}(x_0,y_0,z_0)+\lambda_2 \dfrac{\partial g_2}{\partial y}(x_0,y_0,z_0)\\ \\ \dfrac{\partial f}{\partial z}(x_0,y_0,z_0)=\lambda_1 \dfrac{\partial g_1}{\partial z}(x_0,y_0,z_0)+\lambda_2 \dfrac{\partial g_2}{\partial z}(x_0,y_0,z_0)\\ \\ g_1(x,y,z)=c_1\\ \\ g_2(x,y,z)=c_2 \end{array}\right.\]
By similar reasoning, we obtain equations for minimizing or maximizing
\(f(x_1,\cdots,x_n)\) subject to several constraints
\[\begin{align} \label{Eq:m-constraints} \left\{\begin{array}{l} g_1(x_1,\cdots,x_n)=c_1\\ g_2(x_1,\cdots,x_n)=c_2\\ \vdots\\ g_m(x_1,\cdots,x_n)=c_m \end{array}\right.\tag{iv}\end{align}\]
where \(m<n\). Assume \(f(x_1,\cdots,x_n)\) has a relative extremum at \(\mathbf{x}_0\) when the variables are restricted by the constraint equations iv. If \(f, g_1,\cdots,\) and \(g_m\) have continuous first partial derivatives at all points near \(\mathbf{x}_0\) and if each \(\overrightarrow{\nabla} g_i(\mathbf{x}_0)\) is not a linear combination of the other \(\overrightarrow{\nabla} g_j(\mathbf{x}_0)\) (\(j\neq i\)), then there exists \(m\) real numbers \(\lambda_1,\cdots, \lambda_m\) such that
\[\bbox[#F2F2F2,5px,border:2px solid black]{\overrightarrow{\nabla} f(\mathbf{x}_0)=\lambda_1\overrightarrow{\nabla} g_1(\mathbf{x}_0)+\cdots+\lambda_m \overrightarrow{\nabla} g_m(\mathbf{x}_0).}\]
1 For instance, in this specific example, ↩
\[\begin{cases} x=X(t)=\dfrac{3}{\sqrt{2}}\cos t+2\sqrt{2} \sin t\\ \\ y=Y(t)=-\dfrac{3}{\sqrt{2}}\cos t+2\sqrt{2}\sin t \end{cases}\qquad (0\leq t\leq2\pi)\]
2 In other words, \(f\) and \(g\) have continuous partial derivatives.↩