IV.     Accelerated kernel feature analysis

    To further improve the efficiency and accuracy of SKFA, we propose an Accelerated Kernel Feature Analysis (AKFA) that (i) saves computation time by iteratively updating the Gram Matrix, (ii) normalizes the images with the  constraint before the  constraint is applied, and (iii) optionally discards data that falls below a threshold magnitude  during updates.

i), instead of extracting features directly from the original mapped space, AKFA extracts the i-th feature based on the i-th updated Gram matrix, where each element has .   Since ,


By updating the Gram Matrix, it is unnecessary to save the projection of each individual data on all previous features. The computational cost for extracting i-th feature becomes , instead of  as in SKFA.

    The second (ii) improvement is to revise the  constraint.  As shown in (10), SKFA treats each individual sample data as a possible direction, and computes the projection variances with all data. Since SKFA includes its length in its projection variance calculation, it is biased to select vectors with larger magnitude.  From the objective function (8), we know that we are actually looking for a direction with unit length. When we choose an image vector as a possible direction, we ignore the length and only consider its direction, which improves the accuracy of the features. Therefore, in our AKFA algorithm, we replace the  constraint (9) by


The i-th feature is extracted by


Since  is extracted from  space, the solution is located on one of  for . Equation (10) reduces to


Each  satisfies the  constraint, so the normalization step that appears in SKFA is not required.

Let  denote the image vector corresponding to the i-th feature.  Suppose we have selected  features with , where , , and  is the coefficient matrix, which is upper-triangular. Then .  Let’s study the second term:


where . Therefore,






then , where , and , .

    The third (iii) improvement is to discard negligible data and thereby eliminate unnecessary computations. In the i-th updated Gram matrix , defined in (13), the diagonal elements  represents the reconstruction error of the j-th data point with respect to the previous  features. If  is relatively small, then the previous  features contain most of the information that would be acquired from this image vector.  One can therefore optionally discard image points that satisfy , for some predetermined .    In the following, we use a cut-off  threshold that is useful when the data size is very large.  It can also be used as a criterion to stop extracting features if one is unsure of the number of features that should be selected.

The entire algorithm of AKFA is summarized below:

Step 1:       Compute the  Gram matrix , where n is the number of input vectors. This part requires  operations.

Step 2:       Let  denote the number of features to be extracted.  Initialize the  coefficient matrix  to , and  as an empty list which will ultimately store the indices of the selected image vectors. Initialize the threshold value  for the reconstruction error.  The overall cost is .

Step 3:  For  to  repeat:

1. Using the i-th updated  matrix, extract the i-th feature using (14). If , then discard j-th column and j -th row vector without calculating the projection variance. Use  to store the index. This step requires  operations.

2. Update the coefficient matrix by using (16) and (17), which requires  operations.

3. Use (13) to obtain , an updated Gram matrix. Neglect all rows and columns that contain a diagonal element less than .  This step requires  operations.

The total computational complexity is increased to  when no data is cut during updating in the proposed AKFA.   If we increase , more data will be cut, and the total computational time will be decreased.  If we chose to maintain good reconstruction accuracy,  should be small.  However, if we choose a shorter computation time, a large enough δ must be chosen.  Since the number of data being cut at each step depends on  and the data set, we consider an average situation.   Suppose after extracting a feature, we only keep a fraction  of all possible directions, and let there be one point left after extracting  features. That means , where n is the data size. It is equal to    The total computational time  is


when n is large. The computational time of SKFA is .  Therefore,


For example, when , data size ,  is about 684, so the speed of AKFA is 684 times faster than SKFA.  The experimental results confirm that our features have better performance than those obtained by SKFA.  Fig. 4 summarizes the comparison for computational complexity.