In this section we a present kernel
adapted knn classifier
for the detection of polyps in CT Colonography. This method exploits
the idea
presented in [35]. We initially present the Data dependent kernel
methodology
as follows:

Let _{}be
n-dimensional training samples of the
given data, where _{} represents
the class labels of the samples. We formulate a data-dependent kernel
to
capture the relationship among the data in this classification task.
This
data-dependent composite kernel is formulated as:

_{}
(20)

Where, _{}, _{} and_{}are
kernels we choose, and q(.), the
factor function, takes the form of

_{}
(21)

_{}
(22)

in which _{}, _{} is
the combination coefficient. . Let
us denote the vectors _{} and _{}by _{} and
_{} respectively
where i=1,2. We have _{}= K_{1 }_{} where
K_{1}
is a _{} matrix_{
} given as follows.

_{}
(23)

Let the kernel matrices
corresponding to _{}, _{} and _{} be
_{},
_{} and
_{}.
easy Therefore we can
express composite kernel K as:

_{}
(24)

Defining *Q*_{i} as
the diagonal matrix of
elements _{}we
can express eq(24) as :

_{}
(25)

This kernel model was first
introduced in [32]
and called “conformal transformation of a kernel.”

However, the authors in [32]
did not
consider how to optimize the kernel model. In
the following
section we perform kernel optimization based on the method presented in
[35]and
[36].

__B. Kernel
Optimization____:__

__ __

The optimization of the
data-dependent kernel in (25) is to
set the value of combination coefficient vector _{} so
that the class separability of the
training data in mapped feature space is maximized. For this purpose,
Fisher
scalar is adopted as the objective function of our kernel optimization.
Fisher
scalar measures the class separability of the training data in the
mapped
feature space and is formulated as

_{}
(26)

where S_{b1}, S_{b2 }represents
the
“between-class scatter matrices” and S_{w1}, S_{w2 }are
the
“within-class scatter matrices.” Suppose that the training data are
grouped
according to their class labels, i.e., the first n1 data belong to one
class
and the remaining n2 data belong to the other class (n_{1} + n_{2
}=
n). Then, the basic kernel matrices _{} and
_{} can
be partitioned as :

_{} and _{}
(27)

()

where
the sizes of the submatrices _{}, i = 1,2; are_{}_{}_{}, _{}_{. }respectively.

A close
relation between the class separability
measure J and the kernel matrices has been established [37] as,

_{} and _{}
(28)

(

Where
_{}and _{ }_{}; i =
1,2;
(29)

And for i = 1,2;

_{}
(30)

_{}
(31)

.

To maximize _{} the standard gradient approach is
followed.

Let

_{};
_{}
(32)

Therefore,

_{}
(33)

_{}
(34)

Now
finding _{} for i =
1, 2;

From
eq(28)
_{}
(35)

_{}
(36)

_{} _{}
(37)

The
condition for obtaining maximum value for J_{i}
is that,

_{}

This
implies that,

_{}
(38)

_{} _{}
(39)

From
eq(32),

_{}
(40)

_{} _{}
(41)

If _{} exists,
then

_{}
(42)

If matrix N_{01} is
nonsingular, the optimal _{} that maximizes the J_{i}(α)
is the eigenvector corresponding to the maximum eigenvalue of the
system,

_{}
(43)

Unfortunately, in practice,
especially in analyzing gene
expression data, the matrices _{}, _{} are
frequently singular, which makes
the eigen decomposition method not applicable. A gradient-based
learning
algorithm was proposed in [38] and [36] to solve the optimization
problem.
However, an alternative way to handle this problem as suggested in [35]
is to
introduce a regulation parameter μ > 0 and substitute the matrix N_{0i}
with N_{0i} +μI, where I denote the unit matrix. Therefore, the
optimal μ that maximizes the class separability measure J_{i}(α)
(i = 1,2) is determined by the eigenvector corresponding the maximum
eigenvalue
of the system,

_{} where i =
1,2.
(44)

__Selecting
the best kernels that suit the data:__

__ __

The criterion for selecting the best
kernel function is
finding the kernel which produces largest eigen value from (43), i.e.

_{}
(45)

The idea behind is to choose the
maximum eigen vector alpha
corresponding to maximum eigen value that can maximize the _{} will
result in the optimum
solution.

We find the maximum eigen values for
all possible kernel
functions and arrange them in the descending order to choose most
optimum
kernels. For example the following example shows one of the possible
order.

_{}
(46)

And we pick the kernels
corresponding to first two largest
eigen values for forming composite kernel.

__Optimization
Algorithm:__

*Step1:* Group the data according to
their class
labels. Calculate_{}, _{}first
and then

*
* _{},_{}through
which we can calculate_{} and _{}for i =
1,2;

S*tep 2:* Calculate the
eigen value _{} corresponding to
maximum eigen vector according to (43)

Step 3: Arrange the eigen values in
the descending order of
magnitude.

Step 4: choose the kernels
corresponding to most dominant
eigen values.

Step 5*:* Calculate_{;}

Step 7: Calculate _{}and then
compute _{}.

__Weighted Composite
Kernel Function:__

__ __

In
this section we
propose composite kernel function which is defined as the weighted sum
of the
set of different optimiseed kernel functions. In section VA, we
proposed a data
dependent composite kernel of the form

_{}.

How ever to obtain the optimum
classification accuracy, we
have to redefine the above composite kernel as

_{}

where the value of the composite
coefficient _{} is a constant
scalar. The characteristics of
composite
kernels are determined by different values of _{} for different regions of the input
space. _{} is a vector. Through this approach,
the
relative contribution of both kernels to the model can be varied over
the input
space. And now instead of using_{}as
Kernel function we will be using _{} as
Kernel function.
According to [46]_{}
satisfies the, Mercers condition.

__Method to assign the
weight factor:__

We set up a threshold value to
decide the contribution of each
dominant kernel. For example consider the equation (46); here_{}. There
fore we
define the threshold as follows:

If _{}; then _{}>.9;

_{}; then _{}>.8;
and so on…

The weighted composite kernel can be
then implemented as
follows:

__Choosing Appropriate
parameters and Kernels:__

The following algorithm represents
the method that is
adopted for selecting optimum kernels and their respective parameters
corresponding to the given data set.

Step 1: Use a part of training
data for validation.

Step 2: Select the pair of kernels
which give higher accuracy.

Step 3: Find the weights by the
method mentioned
above.

Step 4: Implement _{} for
optimum parameter_{}. (_{}should be assigned to
the kernel corresponding to maximum classification
accuracy.)

Step 5: Build the model with _{} by
using whole
training data.

Step 6: Test the built model on the
test
set.