Subject: Calculus
Hessian Matrix
Second derivatives, whether in single – or multi – variable calculus, measure the rate of change in slopes or the curvature of the function. What makes problems harder in multivariable calc is that we have slopes in infinitely many directions or called the directional derivatives. So, we somehow need to examine how these infinite number of slopes change to help us determine the curvature and shape of the function f near critical points.
This brings to mind something like second directional derivatives. We summarized the information about slopes by creating a vector of partial derivatives – the gradient. In a similar way, we can help to summarize known information about the rate of change of slopes by creating a matrix of second partial derivatives – the Hessian.
The Hessian matrix can be regarded as a higher form of the Jacobian matrix in a way that it extends its applications to the local curvature of a function. It is describe or written as a square matrix of second-order partial derivatives of a function.
Let us consider a function f(x_1, x_2, x_3, …, x_n). If all of its second-order partial derivatives exist, then we can write the Hessian of this function as,
where x = x_1, x_2, x_3, …, x_n and D_i, D_j are the differentiation operators with respect to the i^th and j^th argument. We can then write the Hessian in its matrix form as follows,
Hessian matrix and its determinant are both referred to as Hessian (the same way as in Jacobian).
For us to fully understand Hessian Matrix, let us take into consideration some matrix theories. Suppose that we have an n row by n column (or square) matrix \mathbf{M}. Then \mathbf{M} is:
If we happen to have a few zero determinants (which will imply a zero eigenvalue – we’ll see why next semester!) then we can say that M is:
Let’s look at the function (x,y)=40+ x^3 (x-4)+ 3 \left [ (y -5)\right ]^2 , which has first partial derivatives f_x=x^2 (4x-12) and f_y=6(y-5) , and the critical points (3, 5) and (0, 5). Let us look at the discriminant of this function which is f_{xx} f_{yy}- f_{xy}^2. This is exactly the determinant of a Hessian matrix.
Our Hessian matrix will now look like this
At the critical point of (3, 5), our Hessian matrix becomes: H_f (3,5) = \begin{bmatrix} 36 & 0\\ 0 & 6 \end{bmatrix}. The discriminant of this matrix 36(6) – 0 is greater than 0. Thus we know we have a maximum or a minimum (we’ve only ruled out saddle point). If we think in terms of the matrix theory, at this point we can’t rule out Positive Definite or Negative Definite (since both would have a determinant > 0 for the whole matrix). Looking at the sign of f_{xx} (which is positive), so we see that we have a minimum. This is analogous to checking the sign of the first principal submatrix’s determinant. Thus, all of the principal submatrices have positive determinants and the Hessian matrix is positive definite at the critical point (3,5).
Let’s look at the critical point (0, 5). Here is the Hessian matrix: H_f (0,5) = \begin{bmatrix} 0 & 0\\ 0 & 6 \end{bmatrix}. Note that the determinant is (0)(6)-0 which is equal to 0. Thus, according to the book, we are unable to classify the critical point. Using our new found powers, we can discriminate further. A quick check of the set of eigenvalues for this matrix yields {0, 6}. Thus, the Hessian matrix is positive semi-definite at the critical point (0, 5). Thus, we know that we cannot have a local maximum (since all local maxima have negative semi-definite Hessians). So we know we have either a local minimum or a saddle point.
So, no matter how many variables we have in our problem, all we have to do is determine whether the Hessian matrix is positive or negative definite at a critical point to exactly classify the critical point as a maximum or minimum.
If we are lucky enough to have the Hessian matrix be positive semi-definite or negative semi-definite for all points (not just the critical point we are looking at), then we even know that we have a global maximum or minimum.
NEXT TOPIC: Curvature