Update Using Predictive online method
We do not perform full reconstruction from every corresponding set of frames. Instead the previously constructed model is updated with current 3D information. Then we find the difference between the previous and current set of frames which exhibit intensity change are the ones which affect the current temporal 3D model. Therefore, we use only this small subset of point in estimating the changes in the new model.
Table 3. Predictive algorithm of the online reconstruction
Updating using prediction of the linear statistical model (Section IV A)
1.1† Predicting the 3D structure points
1.2† Generating the Predicted structure matrix
1.3 Removing the wrong 3D points
Validating the predicted value using feature-based method (Section IV B)
2.1† Validating the predicted feature points.
2.2† Validating the removed wrong feature points.
2.3† Repeat step1 and step2 until there is no more frame subsequences.
In the structure from motion approach, a number of feature points are tracked and a measurement matrix is formed in which each element corresponds to the image coordinates of a tracked point. Then the factorization method is used to recover the camera motion and the 3D model out from those points. In any realistic situation, the measurement matrix may have missing entries. This is either because of the occlusion of certain feature points in some frames or due to a failure in the tracking algorithm. In our method, we are trying to eliminate the number of frames used in the reconstruction process. So, a number of feature points may be missed from a frame to a frame. A prediction process must exist to estimate the value of the feature points. We have exploited the linear statistical model to predict the position of the feature points that have a big difference in optical flow.
IV A. †Step1:† Predicting the location of the feature points using linear statistical model and the flow-based model
In some situations, a tracking algorithm may fail to track proper feature points in the next frame. This is due to the rotation of the object in front of the camera. In our case, we have an object rotating in front of the camera. Such movement may expose some feature points to disappear and others to appear. The missing of feature point comes after a decrease in the displacement between the feature point positions in the consecutive frames each time the object is turning. The feature disappears when the object totally rotates to one side. This information is the reason behind using the regression model as the prediction model. In the following subsection, we provide our predictive method to predict the location of the feature points.
The linear statistical model is used to predict what value will occur for the feature point if it wasnít missed using its displacement history before been missed. We use a form of multiple regression analysis in which the relationship between one or more independent variables and another variable is modeled by a least squares function, called a linear regression equation. This function is a linear combination of one or more model parameters, called regression coefficients.
∑ Step 1.1: Predicting the 3D structure points
The following equation indicates that the position of the feature point , defined in Eq (3), for frame †is the position of it in frame †plus the flow displacement of the point between the frames †and †frames.
Where †is the displacement between the positions of the feature point in the two frames.
Where †is the expected ratio of the increase or decrease in the displacement among the last set of frames having the feature point. The value of †is estimated using linear regression. We assume that the missed point had appeared in at least four frames before been missed. So, used four predictors to predict the ratio of the increase or decrease in the displacement of the feature point over the last set of frames it appears in. We predict each of the horizontal displacement and vertical displacement of the feature point in separate equations. The following predictors are used using the past four frames:
The u- and v-displacement of the feature points in the previous four frames †to †are represented by:
So, the equation to predict the u-displacement of the feature point is
Similarly, the equation to predict the v-displacement of the feature points of frame† †is:
Where †and †are the predicted displacement values for the feature point. To solve the normal equations for the value of β matrix, we substitute the five displacement values (including the predicted one) with an existing displacement value of five points. After having the value of †matrix, the linear system is ready to predict the displacement of a feature point †in frame †when given the displacement of the previous four points.
By substitution the displacement value in equations, the predicted u- and v- coordinates of the feature point will be:
And the predicted feature point will take the form:
∑ Step 1.2: Generating the predicted Structure matrix
After predicting the feature points and removing the wrong feature points as shown in step 1.1 and step 1.2, the predicted structure matrix is generated using the factorization method.
Eq. 56 and 57 in step 1.1show the coordinates of the predicted feature points. According to Eq. 6, the measurement matrix that contains some predicted feature points will called the predicted measurement matrix and will have the form:
†† ††††††††††††††††††† †††††††††††††††
: number of frames
: number of points
Note, not all points in the predicted measurement matrix are predicted, but some of them. Accordingly, we compute the predicted registered measurement matrix †by subtracting the mean †and †from . Where †and †are the mean of all †and †entries in each row of the measurement matrix.
††† ††††††††† ††††††††††††††
†can be expressed in a matrix form:
represents the †motion matrix, and
represents the †predicted shape matrix.
The rows of †represent the predicted orientations of the horizontal and vertical motion throughout the stream. The columns of †are the coordinates of the predicted 3D †feature points with respect to their centroid.
The centroid †can be computed as:
The registered measurement matrix is decomposed into †by SVD (Singular Value Decomposition). If †corresponds to the 3 largest singular values obtained, we estimate the respective motion matrix †and structure matrix †to† be:
The true value of the predicted motion and structure matrices is given by
∑ Step 1.3: Removing the wrong feature points
Our prediction model along with the flow-based method is also used to extract the wrong feature points from frames. Some features are wrong selected by the KLT tracker due to noise or difference in intensity values in the background. The wrong points are characterized by their slow movement through the sequence of frames comparing to the moving object movement. In our method, wrong feature points are removed from the set of feature points. Let the position of a feature point at some frame be = , where †represents the index of frames and †represents the index of feature points.
The feature points are considered wrong feature points if the whole displacement among the frames they appear in is less than the threshold. The point †is deleted if its overall displacement in the consecutive frames is less than a threshold. We delete a wrong feature point by assigning 0 to its value.
where †represents the sum of the horizontal and vertical displacement of the feature point †in range between the first frame it appeared in †up to the last frame it appeared in .
IV B. †Validating the predicted feature points using the feature based method
Some feature points are wrong predicted. The correction method is necessary to validate the process of prediction. Moreover, we used prediction to remove the wrong chosen feature points. In this case the correction is used to make sure that we didnít remove a good feature point. We used the feature-based method to detect and correct the value of the wrong predicted feature points.
∑ Step 2.1: Correcting the predicted feature points.
††††††††††††††††††††††††††† ††† †††††††††††††††††††††††††††††††
To correct the proposed predicted 3D point , we first, convert it to its corresponding 2D point in the frames that has big difference in optical flow in the following manner:
Having the predicted point †and the predicted motion †and †for the frame †that have a big difference in optical flow (compared with its previous frame), then the corresponding 2D point †in frame †can be estimated using the orthographic projection,
Where †and can be estimated.
Having the estimated point , we use the KLT tracker to search for similarity in intensity information within a window to correct the location of the estimated 2D point in the following manner.
First, we set a window †about frame region around the estimated 2D point . Then, the KLT algorithm uses local image intensity gradient vector †to find the corresponding feature point at locations where the minimum eigenvalue of the †symmetric matrix G in eq (KLT1) is above some threshold.† Where
After that, the KLT algorithm uses the †matrix to find the displacement between the previous frame †and the current frame .† A solution for the displacement may be found by finding a displacement †which minimizes the sum of the squares of the intensity in the following cost function in eq. (KLT3):
Then we go through the equations from (KLT3) to (KLT5) to find the displacement .
Once the displacement is calculated, the corrected feature point can be calculated from the previous frame as the following equation:
If the KLT finds the displacement of the estimated feature, then the predicted point is replaced with the corrected point. Otherwise, if no intensity similarity found, then the feature point †is considered as missing point and its value is estimated as suggested by equations from (a) to (c) in section III.
After correcting the value of the predicted point, the new 3D point is projected as described in eq.(3).
∑ Step 2.2: Validating the removed wrong feature points.
To verify whether the removed feature points were wrong, the KLT tracker is applied.
We first convert the removed 3D point †to its corresponding 2D point in all frames that appear in. The corresponding 2D point †in frame †can be estimated using the orthographic projection,
Where †and can be estimated.
Then, we set a window around the estimated feature point †whose size is †, where †is the predefined window size used in the KLT tracker when tracking the feature points among the frame subsequences. The window location is fixed, meaning that it doesnít move from the first image having this feature point up to the last frame.
Now, the KLT tracker is used to search for similarity in intensity information within the window to look for the wrong feature point. The KLT algorithm uses local image intensity gradient vector †to find the corresponding feature point at locations where the minimum eigenvalue of the †symmetric matrix G in eq (KLT1) is above some threshold.† Where
After that, the KLT algorithm uses the †matrix to find the displacement between the previous frame †and the current frame †for all frames the feature points appear in.† A solution for the displacement may be found by finding a displacement †which minimizes the sum of the squares of the intensity in the following cost function in eq. (KLT3):
Then we go through the equations from (KLT3) to (KLT5) to find the displacement . If the displacement †is found within the small window, then it is an indication that the feature point didnít move that much. So it is a wrong feature point and it will not be recovered. If the feature point was not detected within the small window, then this is an indication that the feature point is not a wrong point. Then, this feature point must be recovered.
After recovering the value of the removed point, the new 3D point is projected.
The prediction and validation is repeated for every subsequence of frames as indicated by the flow chart.
We tested our online 3D face reconstruction method on two real image sequences:
1. Hotel Dataset:
A real image sequence of a small model building was used in this paper. The dataset was prepared in the laboratory by using a camera mounted on a computer-controlled movable platform. The camera motion included substantial translation away from the camera and across the field of view. The dataset consists of one hundred frames.
Five frames of the hotel dataset are shown:
The following picture is the output of the KLT feature tracker using the predictive online reconstruction method.
Left image: The first image of the stream with the extracted feature points.
Right image: The tracking path of the feature points in the rest 99 frames using the predictive online reconstruction method.
3D structure points from different viewpoints using the predictive reconstruction method of hotel image stream
The house 3D model with texture mapping:
2. Face Dataset of a male face turning his face left and right facing the camera.
We collect the dataset by capturing a video record of continuous 6 seconds for a male subject. The cameraís position was at the level of the maleís head and far about 30 cms from his face. The video record was recorded under a normal white light and the brightness didnít change during the video record. The background behind the face contained different objects and colors but it was fixed during the video recording. Image Ready software was then used to cut slices from the video for every 0.03 seconds (30 msec). The software generated two hundred frames each of size 1 KB. The two hundred frames were divided into consecutive sequences each has 10 frames.
Five frames of the face dataset are shown:
The output of the KLT feature tracker using the predictive online reconstruction method.
Left image: The first image of the face image stream with the extracted feature points.
Right image: The tracking path of the feature points in the rest 99 frames using the offline reconstruction method.
The 3D points of the face structure
Left image: The face mask
Right image: The face model with texture mapping
Comparison between the offline and the online reconstruction method
Rotation (Face dataset)
Complexity (Number of frames & Complexity)
From the results above, we note the effectiveness of our online 3D face model estimation from image sequences. The method removes the redundancy by eliminating the number of frames that donít show a big difference in optical flow. This method acts in an efficient manner without compromising the quality of the 3D model.