A New Method for Head Direction Estimation based on Dlib Face Detection Method and Implementation of Sine Invers Function

The detection and tracking of head movements have been such an active area of research during the past years. This area contributes highly to computer vision and has many applications of computer vision. Thus, several methods and algorithms of face detection have been proposed because they are required in most modern applications, in which they act as the cornerstone in many interactive projects. Implementation of the detected angles of the head direction is very useful in many fields, such as disabled people assistance, criminal behavior tracking, and other medical applications. In this paper, a new method is proposed to estimate the angles of head direction based on Dlib face detection algorithm that predicts 68 landmarks in the human face. The calculations are mainly based on the predicated landmarks to estimate three types of angles Yaw, Pitch and Roll. A python program has been designed to perform face detection and its direction. To ensure accurate estimation, the particular landmarks were selected, such that, they are not affected by the movement of the head, so, the calculated angles are approximately accurate. The experimental results showed high accuracy measures for the entire three angles according to real and predicted measures. The sample standard deviation results for each real and calculated angle were Yaw (0.0046), Pitch (0.0077), and Roll (0.0021), which confirm the accuracy of the proposed method compared with other studies. Moreover, the method performs faster which promotes accurate online tracking.


Introduction
The estimation of head directions and tracking facial expressions had great importance in the past 25 years. Research has applied the concept in different fields such as self-driving technologies, machinehuman interaction with users, 3D face reconstruction, and other topics that manipulate multimedia content. Historically, several major approaches had been proposed for face modeling such as discriminative/landmark-based approaches. These approaches were able to predict face landmarks on the human face [1].
Many state-of-the-art methods for face detection are based on landmarks-based approach. Such methods, such as Viola and Jones [2][3] are intended to detect faces in a given image or video frames. Viola-Jones method can localize the human face based on trained human face images by implementing Haar-Like features that are classified by Adaboost classifier. In fact, Viola-jones method has very reasonable results as an online face detector [3].
As commonly known, the Viola-Jones method and its enhancements are based on training images of the face pose to estimate whether it is up or down, or left or right. Although some of these methods used more training images for exact angles, but they are still inaccurate in detecting head directions [4]. Therefore, more improved methods had been proposed to make use predicated the landmarks of the human faces. A landmark is a point that is represented by a pixel on the image and refers to a certain position of the human face. In such way, Dlib face detection method had been proposed to predict 86 landmarks that localize the face in a given 2D image [5]. These landmarks can be used to estimate the head direction in degrees if they are implanted in a proper method. Some authors proposed a gaze estimation based on combination of head and eye directions by using Euler's Angles. This method depends on training images that simulate these angles to estimate the gaze angle. It rather calculates approximately accurate results [6].
Therefore, this paper proposes a new method for head direction estimation based on predicted landmarks being obtained from Dlib face detection. In the proposed method, no training images are used to estimate the head direction. But it can calculate the accurate direction of the human head by implementing a mathematical model of Trigonometric functions. Accordingly, would reduce computation time and provide accurate head direction tracking, especially in real time video processing. The paper is organized as: Section 2 illustrates the related works in the field. Section 3 introduces the estimation of head direction. Section 4 demonstrates the proposed method. Section 5 illustrates the experimental results. Finally, the conclusions are illustrated in section 6.

Related Works
During the last decade, many methods for head direction have been proposed relevant research. In [7], the authors proposed a method to estimate the head direction based on Dlib landmarks and ellipsoidal method. Several complex equations were used to calculate Yaw angle and then compare the results with a dataset of images in which each Yaw angle defers in 5 degrees for each image. According to the selected landmarks, the authors calculated the Yaw angle according to the apparent sides of the face. In fact, they did not make the correct selection because using nose landmark and the last apparent area of the face is uncommon in all people. Despite that, they had no solution to estimate other angles. In [8], the authors proposed a 2D to 3D model transformation, in which the 2D image is converted to 3D model depending on a set of training images. The method compares the 2D given image with 2D image of the 3D model in the training images set. The 3D information of the matched 2D image would be considered to be used in head direction estimation based on a complex set of equations and matrixes. It also used Tan function which had no value in 0 degrees which is considered a big problem. This method assumes that the similarity between 2D images led to the same depth of the face features which is not a proper solution. In [6], the authors had proposed a Eular's angles-based method to estimate gaze by tracking the direction of the head based on the extracted face landmarks with Dilb method. The angles had been determined by comparing the captured frame with a set of training images. Each training image refers to a specific angle that would be used in comparison by implementing a neural network which is expensive in terms of time and processing resources. In [9], the authors proposed a neural network with three separated branches to estimate the three angles of the head pose. Average top-k regression was used to predict these angles via comparing the given image with the set of training images.

The Head Direction
Normally, the human head is located on the first vertebra of the spine known as the atlas vertebra, which is located above the axis vertebra where the head rotates as in Figure (1). Such movement arises from the contraction of the front and back muscles responsible for the movements of the head, These movements can be defined as the change in the angles of the direction of the head [10]. These angles are:

Yaw Angle
The head rotates around its vertical axis and turns left and right, which creates a rotation angle between (-75, 75) degrees. This type of movement allows rotation from zero to 150 degrees from right to left [11]. As shown in Figure.1.

Pitch Angle
The head tilts forward and flattens back, and forms a forward tilt angle of (-60, 60) degrees, that is, it allows the forward tilt with a movement ranging from zero to 120 degrees from bottom to top [11]. As shown in Figure.1.

Roll Angle
The head tilts to the right and left sides, and forms a lateral tilt angle between (-40, 40) degrees, that is, it allows a lateral tilt ranging from 0 to 80 degrees from right to left [11]. As shown in Figure.1.

The Proposed Method
Mainly, this paper proposes three models to estimate the angles of the head direction. In which, Dlib predicts landmarks that have been utilized. Some of the face landmarks are useful to calculate Euclidian distance between two points or between a point and a line cross to other points. These distances would be used to calculate trigonometric functions that estimate the proper angles. In order to perform that, three contributions are proposed: In order to estimate Yaw angle, the paper proposes a model based on two couples of points (17 with 21) and (22 with 26) as shown in Figure 2. the 2D distances are calculated between each couple points as shown in eq(1): = √( 2 − 1 ) 2 + ( 2 − 1 ) 2 … (1) Then, Yaw angle is calculated as in eq(2) Where: θ: is the Yaw angle A: is the Euclidean distance between points (17 and 21) B: is the Euclidean distance between points (22 and 26) As shown in Figure 2 and eq.2, the distances are computed as in eq.1 to calculate the sine inverse to estimate the Yaw angle. The Figure 3 illustrates the relationships among these states.   As related to pitch angle, another model is proposed. Three points (1,15 and 29) are specific to those forming triangle shapes are selected as in Figure 6. After that, the Euclidean distance is calculated between the line connected points (1,15) and point (29), as shown in python formula that calculates distance as in eq (3): Vertical Distance (VD) = np.cross(p2 -p1, p3 -p1) / np.linalg.norm(p2 -p1) …(3) Where: Vertical Distance: is the shortest distance between p3 and line p1p2. P1: is landmark (1). P2: is landmark (15). P3: is landmark (29). Then, the Euclidean distances are calculated between the (1 and 29) landmarks and (15, 29) landmarks are calculated as in eq(1), θ1 and θ2 are calculated by sine inverse function as in eq(4) and eq (5): Finally, the average of both angles (θ1 and θ2) represents the Pitch angle as shown in eq(6): ℎ = ( 1 + 2)/2 …(6) The selection of the points had been accurately performed, they are almost on the same line that passes through the center of the head and equals horizontal head axis. The selected points of the face landmarks are not affected by other head movements. The average of both (θ1 and θ2) manipulate pitch angle that may reduce one and decrease another of θ1 and θ2. The sine inverse is used to estimate small angles because its value is incremented speedy in small angles that make deference in the angle's measure.    Figure 8.
Finally, two distances are calculated based on Euclidean algorithm. The first is real distance (rd) and it is calculated between points (1 and 15). The second, is virtual distance. (vd) and it is calculated between points 1 and a virtual point having the x-axis of points 15 and Y axis or points 1. Usually both distances are calculated according to eq(1). A sine inverse function would be used to calculate the Roll angle as shown in eq (7): Where: rd: is distance between the two real points. vd: is distance between one real and other virtual points. If the Roll has a positive value that means the head rolls to right. Otherwise, the head rolls left. But if it is 0 that means the head is in the center of the roll range.   The code has been executed in a Lenovo core-i5 intel CPU with 4GB RAM and common web camera. The Dlib, Math, Numpy and time open-source libraries had been installed. The code was tested on a sample of 25 persons because of the known training dataset giving only 5 degrees difference between images.

Experimental Results
The proposed method was implemented on the samples by moving their heads to different angles and recording the real angles simultaneously with those measurements recorded by the calculated application obtained from the proposed method. The execution time of each of the videos and the Frame Per Second (FPS) were also recorded. The person is 70 cm away from the camera. The results of the sample (1) are listed in Table 1. During the experiments every person was required to perform a set of head movement each with predetermined angle value (Real angles). Then the values of (Calculated Angles) were recorded and compared with the real angles.
Recall has been performed to calculate false predications as shown in eq. 8. After the results being analyzed the average of error between the real and calculated measures was (0.11) for Yaw angle, (0.12) for Pitch angle and (0.1) Roll angle. These results showed no more than 0.14 degrees of error in all calculated angles, which proves the accuracy of the proposed method. As related to the computation time, the average processing time for both face detection and angle measuring was 0.07 second which indicates high efficiency and allows to track head Direction in 14 frames per second. This is proven by calculated FPS. Figures 4,5. and 7. showed the experiment of the three angles with sample (1).
The comparison results of the whole sample set are listed in Table.2. The sample standard deviation has been calculated for the entire results of the sample for each angle by using the form in eq. 9. The Yaw has (0.0046), Pitch has (0.0077) and Roll (0.0021). These values showed no significant deference between the real and calculated measures of the given angles. Where: X : The value of data distribution ̅ : sample N: Total number of the observations Based on the above results analysis and calculated standard division values, the proposed method has performed high accuracy. Moreover, it performs faster than existing methods with 0.07 second processing time for 14 frames per second. As a major contribution of the proposed method, is the selection of right face landmarks to enhance the accuracy of angles estimation.

Conclusions
This paper proposed a new method for the estimation of the head direction implementing Dlib method and Trigonometric functions. The proposed method does not require to be trained more than one time to detect face. The estimation was based on predicted landmarks and mathematical calculation. The impressive point in this method was that the used landmarks were very successfully selected to represent the accurate angle estimation in comparison with previous works.
The Standard Deviation for the calculated angles with low error rate showed high performance that proves the efficiency of the method. Therefore, the proposed method may be implemented with more advanced face detection methods or used in other applications that require head direction estimation.