Our goal is to develop an efficient method for updating the 3D model of a moving object. In practice, scenes seldom change abruptly except in few situations in which lighting conditions suddenly change. The background of a scene is largely static and the foreground changes slightly in subsequent frames. Typically, the whole scene changes slowly over a long sequence of frames. We exploit this observation to reduce the reconstruction time.
We do not perform full reconstruction from every corresponding set of frames. Instead the previously constructed model is updated with current 3D information. Then we find the difference between the previous and current set of frames which exhibit intensity change are the ones which affect the current temporal 3D model. Therefore, we use only this small subset of point in estimating the changes in the new model.
A combination of feature based method and a direct based method is used for initial reconstruction of the 3D model. The direct based method gives the optical flow from a pair of images in a sequence. This information is used to determine if there is a difference in the displacemennt of feature points between the two images. If there is a big difference, then the second image is used in the feature based method to construct the 3D model. Otherwise, the seccond frame is ignored.
Structure From Motion
We adopt structure from motion algorithm to recover 3D model from relatively long image sequences. The image sequences are generated by a video stream captured by moving the object of interest in front a desktop camera. Structure-from-stereo or the stereovision approach makes use of projective invariants from two (or three) images from different viewpoint.
Feature correspondences are feature locations on different images. Using the feature correspondences, epipolar constraints under projective invariant can be written over the image sequence to give a system of equations. Solving this system of equation recovers camera motion and structure simultaneously.
The accuracy of such feature-based algorithms would be dependent on the accuracy of feature tracker. We used the robust and reliable Kanade-Lacas-Kanade (KLT) tracker- Method for identification and matching features from frame to frame in the long sequences of images.
The output of the KLT Tracker are the input of the Tomasi and Kanade proposed factorization method for the recovery of shape and motion from a sequence of images. It achieves its accuracy and robustness by applying the well understood numerical computation Single Value Decomposition or SVD to a large number of images and feature points.
Camera projection used is the orthographic model:
Optical flow refers to the apparent motion of the image intensity or brightness pattern over time. The motion field can be though as the projection of the 3D velocity field on the image field.
Updating the 3D model (Contribution)
In this paper, we propose an efficient approach that exploits the similarities between subsequent frame sequences. Since there are a lot of similarities between the subsequent fame sequences, the 3D reconstruction from the subsequent set of frames is almost identical to the previous reconstruction.
Heuristically, we assume that most of the estimated 3D points in the new model will be the same. We assume that only few of them will change in the model. So the previous model is carried over to the current time.
Only 2D points exhibiting intensity change in the new set of frames are used to amend the new 3D model.
We extend the 2-stage algorithm discusses in section II to update the depth information during the subsequent modeling rather than employing the full reconstruction method all over again. This method gives reconstructs the 3D model efficiently for each timeframe without compromising the quality of the model.
Stage 1: Direct-based Method
After selecting the feature points in the first frame, we start a loop. Each time we compute the optical flow for the consecutive image pairs.
For the video stream sequence at time t1, compute the optical flow for the first frame F1 of the video with the second frame F2 of the sequence, resulting in the optical flow matrix letís say OF.
We compare each entry of the matrix OF with a threshold. Then we count the number of feature points that have a difference in optical flow larger than the threshold. If this number is more than the third of the feature points, then the frame must be taken into account in the factorization method. Otherwise, the frame is simply ignored.
Stage 2: Feature-based Method
We assume that the number of features extracted and number of frames considered for feature correspondences remain the same for all the iterations.
The factorization method is applied to the frames selected using the optical flow discussed in stage 1.
Results Part 2