Update Using Predictive online
method
Introduction
Goal:
We do not perform full
reconstruction from every corresponding set of frames. Instead the previously
constructed model is updated with current 3D information. Then we find the
difference between the previous and current set of frames which exhibit
intensity change are the ones which affect the current temporal 3D model.
Therefore, we use only this small subset of point in estimating the changes in
the new model.
Methodology:
Table
3. Predictive algorithm of the online reconstruction
Step1 |
Updating using prediction of the linear statistical
model (Section IV A) 1.1 Predicting
the 3D structure points 1.2 Generating
the Predicted structure matrix 1.3 Removing
the wrong 3D points |
Step2 |
Validating the predicted value using feature-based
method (Section IV B) 2.1 Validating
the predicted feature points. 2.2 Validating
the removed wrong feature points. 2.3 Repeat step1
and step2 until there is no more frame subsequences. |
In the structure from motion approach, a number of
feature points are tracked and a measurement matrix is formed in which each
element corresponds to the image coordinates of a tracked point. Then the
factorization method is used to recover the camera motion and the 3D model out
from those points. In any realistic situation, the measurement matrix may have missing
entries. This is either because of the occlusion of certain feature
points in some frames or due to a failure in the tracking algorithm. In our
method, we are trying to eliminate the number of frames used in the
reconstruction process. So, a number of feature points may be missed from a
frame to a frame. A prediction process must exist to estimate the value of the
feature points. We have exploited the linear statistical model to predict the
position of the feature points that have a big difference in optical flow.
IV A. Step1: Predicting the location of the feature points
using linear statistical model and the flow-based model
In some situations, a tracking algorithm may fail to
track proper feature points in the next frame. This is due to the rotation of
the object in front of the camera. In our case, we have an object rotating in
front of the camera. Such movement may expose some feature points to disappear
and others to appear. The missing of feature point comes after a decrease in
the displacement between the feature point positions in the consecutive frames
each time the object is turning. The feature disappears when the object totally
rotates to one side. This information is the reason behind using the regression
model as the prediction model. In the following subsection, we provide our
predictive method to predict the location of the feature points.
The linear statistical model is used to predict what value
will occur for the feature point if it wasn’t missed using its displacement
history before been missed. We use a form of multiple regression analysis in
which the relationship between one or more independent
variables
and another variable is modeled by a least squares function, called a linear
regression equation. This function is a linear combination of one or more model
parameters, called regression coefficients.
·
Step 1.1:
Predicting the 3D structure points
The following equation indicates that the position of
the feature point , defined in Eq (3), for frame is the position
of it in frame plus the flow
displacement of the point between the frames and frames.
That is:
, and
Where is the displacement between the positions of
the feature point in the two frames.
Where is the expected ratio of the increase or
decrease in the displacement among the last set of frames having the feature
point. The value
of is estimated using linear regression. We
assume that the missed point had appeared in at least four frames before been
missed. So, used four predictors to predict the ratio of the increase or
decrease in the displacement of the feature point over the last set of frames
it appears in. We predict each of the horizontal displacement and vertical
displacement of the feature point in separate equations. The following predictors
are used using the past four frames:
The u- and v-displacement of the feature
points in the previous four frames to are represented by:
,
So, the equation to predict the u-displacement
of the feature point is
_{ }
Similarly, the equation to predict
the v-displacement of the feature points of frame is:
_{ }
Where _{ }and_{ } are the predicted displacement values for the
feature point. To solve the normal equations for the value of β matrix,
we substitute the five displacement values (including the predicted one) with
an existing displacement value of five points. After having the value of matrix, the linear system is ready to predict
the displacement of a feature point in frame when given the displacement of the previous
four points.
By substitution the displacement value in equations, the
predicted u- and v- coordinates of the feature point will be:
And the predicted feature point will take the form:
·
Step 1.2:
Generating the predicted Structure matrix
After
predicting the feature points and removing the wrong feature points as shown in
step 1.1 and step 1.2, the predicted structure matrix is generated using the
factorization method.
Eq. 56 and 57
in step 1.1show the coordinates of the predicted feature points. According to Eq.
6, the measurement matrix that contains some predicted feature points will
called the predicted measurement matrix and will have the form:
Where,
: number of frames
: number of points
Note, not all points in the predicted measurement
matrix are predicted, but some of them. Accordingly, we compute the predicted
registered measurement matrix by subtracting
the mean and from . Where and are the mean of
all and entries in each
row of the measurement matrix.
can be
expressed in a matrix form:
. Where,
represents the motion matrix,
and
represents the predicted shape
matrix.
The rows of represent the
predicted orientations of the horizontal and vertical motion throughout the
stream. The columns of are the
coordinates of the predicted 3D feature points
with respect to their centroid.
The centroid can be computed
as:
The
registered measurement matrix is decomposed into by SVD
(Singular Value Decomposition). If corresponds to
the 3 largest singular values obtained, we estimate the respective motion
matrix and structure
matrix to be:
The
true value of the predicted motion and structure matrices is given by
·
Step 1.3: Removing
the wrong feature points
Our prediction model along with the flow-based method
is also used to extract the wrong feature points from frames. Some features are
wrong selected by the KLT tracker due to noise or difference in intensity
values in the background. The wrong points are characterized by their slow
movement through the sequence of frames comparing to the moving object
movement. In our method, wrong feature points are removed from the set of
feature points. Let the position of a feature point at some frame be = , where represents the index of frames and represents the index of feature points.
The feature
points are considered wrong feature points if the whole displacement among the
frames they appear in is less than the threshold. The point is deleted if its overall displacement in the
consecutive frames is less than a threshold. We delete a wrong feature point by
assigning 0 to its value.
Where
where represents the sum of the horizontal and
vertical displacement of the feature point in range between the first frame it appeared
in up to the last frame it appeared in .
IV B. Validating the
predicted feature points using the feature based method
Some feature points are wrong predicted. The
correction method is necessary to validate the process of prediction. Moreover,
we used prediction to remove the wrong chosen feature points. In this case the
correction is used to make sure that we didn’t remove a good feature point. We
used the feature-based method to detect and correct the value of the wrong
predicted feature points.
·
Step 2.1:
Correcting the predicted feature points.
To
correct the proposed predicted 3D point , we first,
convert it to its corresponding 2D point in the frames that has big difference
in optical flow in the following manner:
Having the predicted point and the
predicted motion and for the
frame that have a
big difference in optical flow (compared with its previous frame), then the
corresponding 2D point in frame can be
estimated using the orthographic projection,
Where and can be estimated.
Having the estimated
point , we use
the KLT tracker to search for similarity in intensity information within a
window to correct the location of the estimated 2D point in the following
manner.
First, we set a window about frame region around the estimated 2D
point . Then, the
KLT algorithm uses local image intensity gradient vector to find the
corresponding feature point at locations where the minimum eigenvalue of the symmetric matrix G in eq (KLT1) is above some
threshold. Where
After that, the KLT algorithm uses the matrix to find the displacement between the
previous frame and the current frame . A solution for the displacement may be found
by finding a displacement which minimizes the sum of the squares of the
intensity in the following cost function in eq. (KLT3):
Then we go through the equations from (KLT3) to
(KLT5) to find the displacement .
Once the displacement is calculated, the
corrected feature point can be calculated from the previous frame as the
following equation:
If the KLT finds the displacement of the
estimated feature, then the predicted point is replaced with the corrected
point. Otherwise, if no intensity similarity found, then the feature point is considered as missing point and its value
is estimated as suggested by equations from (a) to (c) in section III.
After correcting the value of the predicted
point, the new 3D point is projected as described in eq.(3).
·
Step 2.2: Validating the removed
wrong feature points.
To verify whether the removed feature
points were wrong, the KLT tracker is applied.
We
first convert the removed 3D point to its corresponding 2D point in all frames
that appear in. The corresponding 2D
point in frame can be
estimated using the orthographic projection,
Where and can be estimated.
Then, we set a window around the estimated feature
point whose size is , where is the predefined window size used in the KLT
tracker when tracking the feature points among the frame subsequences. The
window location is fixed, meaning that it doesn’t move from the first image
having this feature point up to the last frame.
Now, the KLT tracker is used to search for
similarity in intensity information within the window to look for the wrong
feature point. The KLT algorithm uses local image intensity gradient vector to find the
corresponding feature point at locations where the minimum eigenvalue of the symmetric matrix G in eq (KLT1) is above some
threshold. Where
After that, the KLT algorithm uses the matrix to find the displacement between the
previous frame and the current frame for all frames the feature points appear in. A solution for the displacement may be found
by finding a displacement which minimizes the sum of the squares of the
intensity in the following cost function in eq. (KLT3):
Then we go through the equations from (KLT3) to
(KLT5) to find the displacement . If the
displacement is found within the small window, then it is
an indication that the feature point didn’t move that much. So it is a wrong
feature point and it will not be recovered. If the feature point was not detected
within the small window, then this is an indication that the feature point is
not a wrong point. Then, this feature point must be recovered.
After recovering the value of the removed
point, the new 3D point is projected.
The
prediction and validation is repeated for every subsequence of frames as
indicated by the flow chart.
Results
Data Condition:
We tested our online 3D face
reconstruction method on two real image sequences:
1. Hotel Dataset:
A real image sequence of a small
model building was used in this paper. The dataset was prepared in the
laboratory by using a camera mounted on a computer-controlled movable platform.
The camera motion included substantial translation away from the camera and
across the field of view. The dataset consists of one hundred frames.
Five frames of the hotel dataset
are shown:
The following picture is the
output of the KLT feature tracker using the predictive online reconstruction
method.
Left image: The first image of
the stream with the extracted feature points.
Right image: The tracking path
of the feature points in the rest 99 frames using the predictive online
reconstruction method.
3D structure points from
different viewpoints using the predictive reconstruction method of hotel image
stream
The house 3D model with texture
mapping:
2. Face Dataset
of a male face turning his face left and right
facing the camera.
We collect the dataset by
capturing a video record of continuous 6 seconds for a male subject. The
camera’s position was at the level of the male’s head and far about 30 cms from
his face. The video record was recorded under a normal white light and the
brightness didn’t change during the video record. The background behind the
face contained different objects and colors but it was fixed during the video
recording. Image Ready software was then used to cut slices from the video for
every 0.03 seconds (30 msec). The software generated two hundred frames each of
size 1 KB. The two hundred frames were divided into consecutive sequences each
has 10 frames.
Five frames of the face dataset
are shown:
The output of the KLT feature tracker using the predictive online reconstruction method.
Left image: The first image of the face image stream with the extracted feature points.
Right image: The tracking path of the feature points in the rest 99 frames using the offline reconstruction method.
The 3D points of the face structure
Left image: The face mask
Right image: The face model with texture mapping
Comparison between the offline and the online reconstruction method
Rotation (Face dataset)
Complexity (Number of frames & Complexity)
From the results above, we note the effectiveness of our online 3D face model estimation from image sequences. The method removes the redundancy by eliminating the number of frames that don’t show a big difference in optical flow. This method acts in an efficient manner without compromising the quality of the 3D model.