Subject: Calculus

Hessian Matrix

Second derivatives, whether in single – or multi – variable calculus, measure the rate of change in slopes or the curvature of the function. What makes problems harder in multivariable calc is that we have slopes in infinitely many directions or called the directional derivatives. So, we somehow need to examine how these infinite number of slopes change to help us determine the curvature and shape of the function f near critical points.

This brings to mind something like second directional derivatives. We summarized the information about slopes by creating a vector of partial derivatives – the gradient. In a similar way, we can help to summarize known information about the rate of change of slopes by creating a matrix of second partial derivatives – the Hessian.

The Hessian matrix can be regarded as a higher form of the Jacobian matrix in a way that it extends its applications to the local curvature of a function. It is describe or written as a square matrix of second-order partial derivatives of a function.

Let us consider a function f(x_1, x_2, x_3, …, x_n). If all of its second-order partial derivatives exist, then we can write the Hessian of this function as,

H(f)_{ij}(x)=D_i D_j f(x)

where x = x_1, x_2, x_3, …, x_n and D_i, D_j are the differentiation operators with respect to the i^th and j^th argument. We can then write the Hessian in its matrix form as follows,

H(f)=\begin{bmatrix} \frac{\partial^2 f}{\partial x_{1}^2} & \frac{\partial^2 f}{\partial x_{2}^2} & \cdot \cdot \cdot & \frac{\partial^2 f}{\partial x_{1}\partial x_n}\\ \frac{\partial^2 f}{\partial x_{2}\partial x_1} & \frac{\partial^2 f}{\partial x_{2}^2} & \cdot \cdot \cdot & \frac{\partial^2 f}{\partial x_{2}\partial x_n} \\ : & : & & : \\ \frac{\partial^2 f}{\partial x_{n}\partial x_1} & \frac{\partial^2 f}{\partial x_{n}\partial x_1} & \cdot \cdot \cdot & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}

Hessian matrix and its determinant are both referred to as Hessian (the same way as in Jacobian).

For us to fully understand Hessian Matrix, let us take into consideration some matrix theories. Suppose that we have an n row by n column (or square) matrix \mathbf{M}. Then \mathbf{M} is:

Positive Definite if x^{-T}M x^{-T} > 0 for any n-vector x^{-T} \neq 0. There are a few tests to determine when a matrix is positive definite. One is that the determinants of all of its leading principal submatrices (the submatrices made up of its first k rows and columns for k = 1, …, n are all positive. Another test for positive definiteness is that the eigenvalues of M are all positive real numbers.
Negative Definite if x^{-T}M x^{-T} < 0 for any n-vector x^{-T} \neq 0. Again, there are a few tests to determine when a matrix is negative definite. One is that the determinants of all its leading principal submatrices are nonzero and alternate in sign with the first being negative. Another test for negative definiteness is that the eigenvalues of M are all negative real numbers.

If we happen to have a few zero determinants (which will imply a zero eigenvalue – we’ll see why next semester!) then we can say that M is:

Positive Semi-definite if the determinants of all of its principal submatrices are all non-negative, or if all of its eigenvalues are non-negative real numbers. Note that any positive definite matrix also satisfies the definition of positive semi-definite.
Negative Semi-definite if the determinants of all of its principal submatrices are alternate in sign, starting with a negative (with the allowance here of 0 determinants replacing one or more of the positive or negative values). Another test is to check if all of its eigenvalues are non-positive real numbers. Note that any negative definite matrix also satisfies the definition of negative semi-definite.

Let’s look at the function (x,y)=40+ x^3 (x-4)+ 3 \left [ (y -5)\right ]^2 , which has first partial derivatives f_x=x^2 (4x-12) and f_y=6(y-5) , and the critical points (3, 5) and (0, 5). Let us look at the discriminant of this function which is f_{xx} f_{yy}- f_{xy}^2. This is exactly the determinant of a Hessian matrix.

Our Hessian matrix will now look like this

H_{f}(x,y) = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} \\ \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix} = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{bmatrix} = \begin{bmatrix} 12x^2-24x & 0 \\ 0 & 6 \end{bmatrix}

At the critical point of (3, 5), our Hessian matrix becomes: H_f (3,5) = \begin{bmatrix} 36 & 0\\ 0 & 6 \end{bmatrix}. The discriminant of this matrix 36(6) – 0 is greater than 0. Thus we know we have a maximum or a minimum (we’ve only ruled out saddle point). If we think in terms of the matrix theory, at this point we can’t rule out Positive Definite or Negative Definite (since both would have a determinant > 0 for the whole matrix). Looking at the sign of f_{xx} (which is positive), so we see that we have a minimum. This is analogous to checking the sign of the first principal submatrix’s determinant. Thus, all of the principal submatrices have positive determinants and the Hessian matrix is positive definite at the critical point (3,5).

Let’s look at the critical point (0, 5). Here is the Hessian matrix: H_f (0,5) = \begin{bmatrix} 0 & 0\\ 0 & 6 \end{bmatrix}. Note that the determinant is (0)(6)-0 which is equal to 0. Thus, according to the book, we are unable to classify the critical point. Using our new found powers, we can discriminate further. A quick check of the set of eigenvalues for this matrix yields {0, 6}. Thus, the Hessian matrix is positive semi-definite at the critical point (0, 5). Thus, we know that we cannot have a local maximum (since all local maxima have negative semi-definite Hessians). So we know we have either a local minimum or a saddle point.

So, no matter how many variables we have in our problem, all we have to do is determine whether the Hessian matrix is positive or negative definite at a critical point to exactly classify the critical point as a maximum or minimum.

If we are lucky enough to have the Hessian matrix be positive semi-definite or negative semi-definite for all points (not just the critical point we are looking at), then we even know that we have a global maximum or minimum.

NEXT TOPIC: Curvature