To further improve the efficiency and accuracy of SKFA, we propose an Accelerated Kernel Feature Analysis (AKFA) that (i) saves computation time by iteratively updating the Gram Matrix, (ii) normalizes the images with the constraint before the constraint is applied, and (iii) optionally discards data that falls below a threshold magnitude during updates.
i), instead of extracting features directly from the original mapped space, AKFA extracts the i-th feature based on the i-th updated Gram matrix, where each element has . Since ,
By updating the Gram Matrix, it is unnecessary to save the projection of each individual data on all previous features. The computational cost for extracting i-th feature becomes , instead of as in SKFA.
The second (ii) improvement is to revise the constraint. As shown in (10), SKFA treats each individual sample data as a possible direction, and computes the projection variances with all data. Since SKFA includes its length in its projection variance calculation, it is biased to select vectors with larger magnitude. From the objective function (8), we know that we are actually looking for a direction with unit length. When we choose an image vector as a possible direction, we ignore the length and only consider its direction, which improves the accuracy of the features. Therefore, in our AKFA algorithm, we replace the constraint (9) by
The i-th feature is extracted by
Since is extracted from space, the solution is located on one of for . Equation (10) reduces to
Each satisfies the constraint, so the normalization step that appears in SKFA is not required.
Let denote the image vector corresponding to the i-th feature. Suppose we have selected features with , where , , and is the coefficient matrix, which is upper-triangular. Then . Let’s study the second term:
where . Therefore,
then , where , and , .
The third (iii) improvement is to discard negligible data and thereby eliminate unnecessary computations. In the i-th updated Gram matrix , defined in (13), the diagonal elements represents the reconstruction error of the j-th data point with respect to the previous features. If is relatively small, then the previous features contain most of the information that would be acquired from this image vector. One can therefore optionally discard image points that satisfy , for some predetermined . In the following, we use a cut-off threshold that is useful when the data size is very large. It can also be used as a criterion to stop extracting features if one is unsure of the number of features that should be selected.
The entire algorithm of AKFA is summarized below:
Step 1: Compute the Gram matrix , where n is the number of input vectors. This part requires operations.
Step 2: Let denote the number of features to be extracted. Initialize the coefficient matrix to , and as an empty list which will ultimately store the indices of the selected image vectors. Initialize the threshold value for the reconstruction error. The overall cost is .
Step 3: For to repeat:
1. Using the i-th updated matrix, extract the i-th feature using (14). If , then discard j-th column and j -th row vector without calculating the projection variance. Use to store the index. This step requires operations.
2. Update the coefficient matrix by using (16) and (17), which requires operations.
3. Use (13) to obtain , an updated Gram matrix. Neglect all rows and columns that contain a diagonal element less than . This step requires operations.
The total computational complexity is increased to when no data is cut during updating in the proposed AKFA. If we increase , more data will be cut, and the total computational time will be decreased. If we chose to maintain good reconstruction accuracy, should be small. However, if we choose a shorter computation time, a large enough δ must be chosen. Since the number of data being cut at each step depends on and the data set, we consider an average situation. Suppose after extracting a feature, we only keep a fraction of all possible directions, and let there be one point left after extracting features. That means , where n is the data size. It is equal to The total computational time is
when n is large. The computational time of SKFA is . Therefore,
For example, when , data size , is about 684, so the speed of AKFA is 684 times faster than SKFA. The experimental results confirm that our features have better performance than those obtained by SKFA. Fig. 4 summarizes the comparison for computational complexity.