# IV. Accelerated kernel feature analysis

To further improve the efficiency and accuracy of SKFA,
we propose an *Accelerated Kernel Feature Analysis* (AKFA) that (*i*)
saves computation time by iteratively updating the Gram Matrix, (*ii*)
normalizes the images with the _{} constraint before the _{} constraint is
applied, and (*iii*) optionally discards data that falls below a threshold
magnitude _{} during
updates.

i), instead of extracting features directly from
the original mapped space, AKFA extracts the *i*-th feature based on the *i*-th
updated Gram matrix_{},
where each element has _{}. Since _{},

_{}_{} (13)

By updating the Gram Matrix, it is unnecessary to save the
projection of each individual data on all previous features. The computational
cost for extracting *i*-th feature becomes _{}, instead of _{} as in SKFA.

The second (*ii*) improvement is to revise the _{} constraint. As
shown in (10), SKFA treats each individual sample data as a possible direction,
and computes the projection variances with all data. Since SKFA includes its
length in its projection variance calculation, it is biased to select vectors
with larger magnitude. From the objective function (8), we know that we are
actually looking for a direction with unit length. When we choose an image
vector as a possible direction, we ignore the length and only consider its
direction, which improves the accuracy of the features. Therefore, in our AKFA
algorithm, we replace the _{} constraint (9) by

_{}

The *i*-th feature is extracted by

_{}

Since _{} is extracted from _{} space, the solution is
located on one of _{} for
_{}. Equation
(10) reduces to

_{} (14)

Each _{} satisfies the _{} constraint, so the
normalization step that appears in SKFA is not required.

Let _{} denote the image vector corresponding
to the *i*-th feature. Suppose we have selected _{} features with _{}, where _{}, _{}, and _{} is the coefficient
matrix, which is upper-triangular. Then _{}. Let’s study the second term:

(15)

where _{}. Therefore,

_{}

Let

_{} (16)

and

_{} (17)

then _{}, where _{}, and _{}, _{}.

The third (*iii*) improvement is to discard
negligible data and thereby eliminate unnecessary computations. In the *i*-th
updated Gram matrix _{}, defined in (13), the diagonal elements
_{} represents
the reconstruction error of the *j*-th data point with respect to the
previous _{} features.
If _{} is
relatively small, then the previous _{} features contain most of the
information that would be acquired from this image vector. One can therefore
optionally discard image points that satisfy _{}, for some predetermined _{}. In the
following, we use a* cut-off ** threshold* that is useful when the
data size is very large. It can also be used as a criterion to stop extracting
features if one is unsure of the number of features that should be selected.

The entire algorithm of AKFA is summarized below:

**Step 1:** Compute the _{} Gram matrix _{}, where *n* is
the number of input vectors. This part requires _{} operations.

**Step 2:** Let _{} denote the number of
features to be extracted. Initialize the _{} coefficient matrix _{} to _{}, and _{} as an empty list which will
ultimately store the indices of the selected image vectors. Initialize the
threshold value _{} for
the reconstruction error. The overall cost is _{}.

**Step 3:** For _{} to _{} repeat:

1. Using the *i*-th updated _{} matrix, extract the *i*-th
feature using (14). If _{}, then discard *j*-th column and *j*
-th row vector without calculating the projection variance. Use _{} to store the index.
This step requires _{} operations.

2. Update the coefficient matrix by using
(16) and (17), which requires _{} operations.

3. Use (13) to obtain _{}, an updated Gram matrix.
Neglect all rows and columns that contain a diagonal element less than _{}. This step requires
_{} operations.

The total computational complexity is increased to _{} when no data is cut
during updating in the proposed AKFA. If we increase _{}, more data will be cut, and
the total computational time will be decreased. If we chose to maintain good reconstruction
accuracy, _{} should
be small. However, if we choose a shorter computation time, a large enough *δ*
must be chosen. Since the number of data being cut at each step depends on _{} and the data set, we
consider an average situation. Suppose after extracting a feature, we only
keep a fraction _{} of
all possible directions, and let there be one point left after extracting _{} features. That means
_{}, where *n*
is the data size. It is equal to _{} The total computational time _{} is

_{} (18)

when *n* is large. The computational time of SKFA is _{}. Therefore,

_{} (19)

For example, when _{}, data size _{}, _{} is about 684, so the speed of
AKFA is 684 times faster than SKFA. The experimental results confirm that our
features have better performance than those obtained by SKFA. Fig. 4
summarizes the comparison for computational complexity.