-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathIntroduction_to_Machine_Learning_Lab2023.Rmd
560 lines (396 loc) · 23.3 KB
/
Introduction_to_Machine_Learning_Lab2023.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
---
title: "Introduction to Machine Learning Classification lab for Mass Spectrometry data"
author: "Francisco Madrid, Toni Pardo"
date: "2023-03-30"
output:
word_document: default
pdf_document: null
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Objectives
Introduce the following feature extraction and classification algorithms:
• Principal Component Analysis (PCA)
• k-Nearest Neighbours classifiers (kNN)
Apply those algorithms to spectra from prostate tissues measured with Surface-Enhanced Laser Desorption/Ionization (SELDI) Mass Spectrometry (from the ChemometricsWithRData package).
# Dataset
We will use the prostate mass spectra from the ChemometricsWithRData package. Since this package is no longer in the CRAN repository you will need to install it directly from the ChemometricsWithR compressed file provided in the campus virtual.
The Prostate2000Raw data contains 654 mass spectra, belonging to 327 subjects (two replicates per subject).
Each subject belongs to one the following groups:
• patients with prostate cancer
• patients with benign prostatic hyperplasia
• control subjects
This data was made public in the following papers 1 and 2.
# Procedure
In this lab we will explore several feature extraction and classification techniques applied to proteomic data. These techniques are:
• Principal Component Analysis
• K-Nearest Neighbours
---
<font size="1"> 1 B.L. Adam, Y.Qu, J.W. Davis, M.D. Ward, M.A. Clements, L.H. Cazares, O.J. Semmes, P.F. Schellhammer, Y. Yasui, Z. Feng, G.L. Wright, “Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men”, Cancer Res. 63, 3609-3614, (2002)
2 Y. Qu, B.L. Adam, Y. Yasui, M.D. Ward, L.H. Cazares, P.F. Schellhammer, Z. Feng, O.J. Semmes, G.L. Wright , “Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients”, Clinical Chemistry, 48, 1835-1843, (2002)</font>
We start loading all the needed packages and the data using the RStudio terminal. Optionally, you can obtain the same results using the following code lines:
<text># run `install.packages` only if packages are not already installed!</text>
`install.packages(c("ChemometricsWithR", "MASS"))`
`install.packages(c("e1071", "sfsmisc", "class", "caret", ”lolR”))`
`packageurl <- "http://cran.r-project.org/src/contrib/Archive/ChemometricsWithRData/ChemometricsWithRData_0.1.3.tar.gz"`
`install.packages(packageurl, repos=NULL, type="source")`
```{r warning=FALSE}
library("ChemometricsWithR")
library("MASS")
library("pls")
library("sfsmisc")
library("e1071")
library("class")
library("caret")
library("lolR")
```
# Loading the data
```{r}
data(Prostate2000Raw, package = "ChemometricsWithRData")
mz_prost <- Prostate2000Raw$mz
intensity_with_replicates <- Prostate2000Raw$intensity
medical_cond <- Prostate2000Raw$type
levels(medical_cond)
print("benign prostatic hyperplasia = bph")
print("benign prostatic hyperplasia = control")
print("patients with prostate cancer = pca")
```
# Preprocessing
The spectra from the Prostate2000Raw dataset are already baseline corrected and normalized, according to the help page. We will perform two additional preprocessing steps:
• Replicate averaging
• Log transformation
# Replicate averaging
As each subject is measured twice, we will average consecutive spectra (belonging to the same subject):
```{r}
num_subjects <- ncol(intensity_with_replicates)/2
intensity_avg <- matrix(0, nrow = nrow(intensity_with_replicates), ncol = num_subjects)
for (i in seq(1, num_subjects)) {
intensity_avg[, i] <- rowMeans(intensity_with_replicates[, c(2*i - 1, 2*i)])
}
```
medical_cond has 654 class values, one for each spectrum. We take one every two types to have 327 values, one for each subject in our intensity_avg matrix:
```{r}
subject_type <- medical_cond[seq(from = 1, to = 654, by = 2)]
```
# Log transformation
Log transformation transforms the intensities to their log values. The measured intensities span a wide dynamic range of values. The same peak in some spectrum can be much larger than in other spectra. The distribution of intensities in the spectra is non-gaussian, and by using a log transform we can make it more like a gaussian distribution.
Having gaussian-like data is beneficial for PCA, as PCA is based on the covariance matrix.
Therefore, if we create our models using the logarithm of the intensities instead of the intensities, we will be able to capture better the information of our largest peaks in the mass spectra, as their histogram will resemble more a gaussian
```{r}
# First we transform the intensity values to a log scale. To do that, we # create a copy of our data and then transform it:
intensity_log <- intensity_avg
# values close to zero would go to -infinity. We want to avoid that. A simple # solution is to use a threshold:
intensity_log[intensity_log < 5e-3] <- 5e-3
# log transformation:
intensity_log <- log10(intensity_log)
```
If you look at a given m/z variable and check the intensity values you can see that our variables were not normally distributed: There are few values with large intensities, the distribution is not symmetric:
```{r}
hist(intensity_avg[2000,], breaks = 200, xlab = "Intensity (a.u.)",
main = sprintf("Histogram of 327 raw intensities with m/z = %f Da", mz_prost[2000]))
hist(intensity_log[2000,], breaks = 200, xlab = "log-Intensity (a.u.)",
main = sprintf("Histogram of 327 log intensities with m/z = %f Da", mz_prost[2000]))
```
The classification algorithms assume that each sample is given in a row, therefore we need to transpose the intensity matrix:
```{r}
intensity <- t(intensity_log)
```
Let’s just check the dimensionality to confirm:
```{r}
message("Number of samples: ", nrow(intensity))
message("Number of variables: ", ncol(intensity))
```
The balance of sample types in the dataset is important to many algorithms: If we had very few samples of a particular class (for instance very few benign prostatic hyperplasia subjects), we would have to consider either (i) looking for more samples of that class, (ii) drop all the hyperplasia samples and simplify the experiment or (iii) use algorithms able to work with unbalanced datasets.
```{r}
table(subject_type)
```
**Is the dataset balanced? What is the percentage of samples of each class?**
As you may see the dimensionality of the raw data is pretty high. The usual procedure to reduce this type of data consist of finding common peaks and integrating their area (aside from smoothing, binning, peak alignment, normalization and other signal processing steps to enhance signal quality).
However, here to simplify we will follow a brute force strategy (not optimal but easy): We will consider every single point in the spectra as a distinctive feature. This brute force strategy is sometimes used in bioinformatics, but generally does not provide the best results.
# Train/Test division
To estimate how our trained model will perform, we need to split the dataset into a training subset and a test subset. The train subset will be used to train the model and the test subset will be used to estimate the performance of the model.
We will use 60% of the samples for training and 40% of samples for test, having the training and test subsets balanced for each subject condition.
```{r}
pca_idx <- which(subject_type == "pca")
pca_idx_train <- sample(pca_idx, round(0.6*length(pca_idx)))
pca_idx_test <- setdiff(pca_idx, pca_idx_train)
bph_idx <- which(subject_type == "bph")
bph_idx_train <- sample(bph_idx, round(0.6*length(bph_idx)))
bph_idx_test <- setdiff(bph_idx, bph_idx_train)
ctrl_idx <- which(subject_type == "control")
ctrl_idx_train <- sample(ctrl_idx, round(0.6*length(ctrl_idx)))
ctrl_idx_test <- setdiff(ctrl_idx, ctrl_idx_train)
train_idx <- c(pca_idx_train, bph_idx_train, ctrl_idx_train)
test_idx <- c(pca_idx_test, bph_idx_test, ctrl_idx_test)
# use the indexes to split the matrix into train and test
intensity_trn <- intensity[train_idx,]
intensity_tst <- intensity[test_idx,]
# use the indexes to split the labels
subject_type_trn <- subject_type[train_idx]
subject_type_tst <-subject_type[test_idx]
message("Number of samples in training: ", nrow(intensity_trn))
message("Number of samples in test: ", nrow(intensity_tst))
```
# knn in full input space
Let us first implement a knn classifier in the raw input space.
```{r}
subject_type_tst_knn_pred <- class::knn(train = intensity_trn,
test = intensity_tst,
cl = subject_type_trn, k = 5)
confmat_knn <- table(subject_type_tst, subject_type_tst_knn_pred)
print(confmat_knn)
CR_knn <- sum(diag(confmat_knn))/sum(confmat_knn)
message("The classification rate for kNN is: ", 100*round(CR_knn, 2), "%")
```
# Nearest Centroid Classifier
In this section we are going to use a Nearest Centroid Classifier using the ‘lolR’ package. First let us produce a scatter plot of two random components.
```{r}
X <- intensity_trn
Y <- subject_type_trn
datalab1<-data.frame(x1=X[,2000],x2=X[,2010],y=Y)
datalab1$y<-factor(datalab1$y)
ggplot(datalab1, aes(x = x1, y = x2, color = y)) +
geom_point() +
labs(color = "Status") +
xlab("x1") +
ylab("x2") +
ggtitle("Prostate data")
```
Now we estimate the class centers with the Nearest Centroid Classifier
```{r}
classifier <- lol.classify.nearestCentroid(X,Y)
```
Now let’s plot the centroids on the same scatter plot. Please take into account that this is only a partial representation of the data, since the real dimensionality is much higher.
```{r}
datalab11 <- cbind(datalab1,data.frame(size = 1))
datalab111 <- rbind(datalab11,data.frame(x1 = classifier$centroids[,2000],
x2 = classifier$centroids[,2010],
y = classifier$ylabs,
size = 5))
ggplot(datalab111, aes(x=x1, y=x2, color=y, size=size)) +
geom_point() +
xlab("x1") +
ylab("x2") +
ggtitle("Data with estimated Centers") +
guides(size = "none")
```
Let us now predict training data with the nearest centroid classifier
```{r}
Yhat <- predict(classifier,X)
datalab111$y[1:(length(datalab111$y)-3)] <- Yhat
ggplot(datalab111,aes(x=x1,y=x2, color=y,size=size))+
geom_point()+
xlab("x1")+
ylab("x2")+
ggtitle("Training Data with Predictions")+
guides(size=FALSE)
```
Finally we can assess the performance of the classifier on the test set.
```{r}
subject_type_tst_NC_pred<-predict(classifier,intensity_tst)
confmat_NC <- table(subject_type_tst, subject_type_tst_NC_pred)
print(confmat_NC)
CR_NC <- sum(diag(confmat_NC))/sum(confmat_NC)
message("The classification rate for NC is: ", 100 * round(CR_NC, 2), "%")
```
Take into account that the classifier operates in the full space, not just in the projection that we have plotted.
# Feature Extraction by Principal Component Analysis
In most occasions, classifiers are not built in the raw data due to the existence of many dimensions without discriminant information.
In this section we will explore Dimensionality Reduction by Principal Component Analysis (PCA). PCA is an unsupervised technique that returns directions (eigenvectors) that explain the maximum data variance while being orthogonal among them. While PCA is optimal in compressing the data into a lower dimensionality, we have to be aware that maximum variance does not imply maximum discriminability. In other words, PCA may not be the optimal procedure to reduce the dimensionality. It is however, the default technique for initial data exploration and pattern recognition design.
$D = S∗L^T + E$
PCA will find a new basis for our data in which each of the directions maximize the explained variance of our data. The change of basis matrix L is called the PCA loadings, while the projection of our data in this new basis are the scores.
If we truncate the PCA decomposition to a number of principal components npc, then the product S∗LT will not be exactly as D, and the difference will be the matrix of residuals E.
We will use the PCA function from the ChemometricsWithR package.
# PCA needs to be applied to a matrix where each feature has zero mean.
```{r}
intensity_trn_preproc <- scale(intensity_trn, center = TRUE,
scale = FALSE)
mean_spectrum_trn <- attr(intensity_trn_preproc, 'scaled:center')
```
The test data is centered using the mean and standard deviation computed from the train data:
```{r}
intensity_tst_preproc <- scale(intensity_tst,
center = mean_spectrum_trn,
scale = FALSE)
```
We can now perform the dimensionality reduction and observe how the variance is distributed among the first principal components:
```{r}
pca_model <- PCA(intensity_trn_preproc)
summary(pca_model)
```
```{r}
pca_model_var <- variances(pca_model)
pca_model_var_percent <- 100 * cumsum(pca_model_var)/sum(pca_model_var)
plot(x = 1:length(pca_model_var_percent),
y = pca_model_var_percent, type = "l",
xlab = "Number of principal components",
ylab = "Cummulated variance (%)",
main = "% variance captured vs #PC used")
```
Based on the knee of the plot, we choose the number of components of the PCA space.
We can plot the projection of the date in the plane of maximum variance.
**Do you think that the classes have a Gaussian distribution?**
```{r}
scoreplot(pca_model,
pch = as.integer(subject_type_trn),
col = as.integer(subject_type_trn))
legend("topright",
legend = levels(subject_type_trn),
pch = 1:nlevels(subject_type_trn),
col = 1:nlevels(subject_type_trn))
```
```{r}
# BiocManager::install("mixOmics")
library(mixOmics)
mixPCA <- mixOmics::pca(as.data.frame(intensity_trn_preproc), ncomp = 2, scale = FALSE)
mixOmics::plotIndiv(mixPCA,
group = as.integer(subject_type_trn),
ind.names = as.integer(subject_type_trn),
legend = TRUE,
ellipse = TRUE,
ellipse.level = 0.95,
col.per.group = c("green", "darkred", "cyan")
)
```
```{r}
plotLoadings(mixPCA, comp = 1, ndisplay = 20)
```
We can retrieve the new feature vectors after dimensionality reduction. In other words, these are the projections of the feature vector in the space spanned by the first eigenvectors of the covariance matrix. Those eigenvectors are now the new space basis. For classifier design this value needs to be optimized. It is always advisable to have many more samples than dimensions. Additionally we project the test data into the model computed with the train data.
```{r}
pca_model_ncomp <- 50
intensity_scores_trn <- project(pca_model, npc = pca_model_ncomp, intensity_trn_preproc)
intensity_scores_test <- project(pca_model, npc = pca_model_ncomp, intensity_tst_preproc)
```
We can visualize how the training data is distributed along the different principal components (beyond the first two components):
```{r}
pairs(intensity_scores_trn[,1:5],
labels = paste("PC", 1:5),
pch = as.integer(subject_type_trn),
col = as.integer(subject_type_trn))
```
**Is any of these first 5 principal components able to discriminate sample types?**
We can also visualize the loadings of the PCA model. These loadings select regions in the mass spectra that co-vary and explain simultaneously most of the variance. Let’s calculate the first two eigenvectors and let’s plot them.
By looking at the first loading, we will see the variables that were used to compute the scores of the first principal component.
```{r}
pca_model_loadings <- loadings(pca_model, pca_model_ncomp)
matplot(pca_model_loadings[,c(1,2)], col = c(1,2), type = 'l')
```
If there is no variable clearer than the rest, it means that the first (and second loadings) are distributed among many variables and the interpretation may be harder than what we can cover in this session.
# Nearest Neighbour Classifier on the PCA space
A nearest neighbour classifier is a non-parametric method used for pattern recognition and classification. The method classifies new samples based on the classes of the k closest training samples.
The k-NN classifier is implemented in the class package.
The kNN will predict the classes of the test set, given the training samples, their classes and the number of neighbours to look.
We will apply the k-NN classifier on the PCA space.
```{r}
library("class")
```
We can calculate the confusion matrix. Take some time to understand this table.
**What groups of samples is the kNN misclassifying?**
```{r}
subject_type_tst_pca_knn_pred <- class::knn(train = intensity_scores_trn,
test = intensity_scores_test,
cl = subject_type_trn,
k = 5) # k = 3
confmat_pca_knn <- table(subject_type_tst, subject_type_tst_pca_knn_pred)
print(confmat_pca_knn)
CR_pca_knn <- sum(diag(confmat_pca_knn))/sum(confmat_pca_knn)
message(CR_pca_knn)
message("The classification rate for PCA-kNN is: ", 100*round(CR_pca_knn, 2), "%")
```
```{r}
?binom.test
number_of_observations <- sum(confmat_pca_knn)
number_of_hits <- sum(diag(confmat_pca_knn))
probability <- number_of_hits/number_of_observations # 0.75
probability
confident_level <- binom.test(number_of_hits, number_of_observations, probability, conf.level = 0.95)
confident_level$conf.int
```
```{r, fig.height=5}
# let us project the centroids in the scoreplot
PCA_centroids <- classifier[["centroids"]]
scoreplot(pca_model, pch = as.integer(subject_type_trn),
col = as.integer(subject_type_trn))
legend("topright",
legend = levels(subject_type_trn),
pch = 1:nlevels(subject_type_trn),
col = 1:nlevels(subject_type_trn))
points(PCA_centroids[,1],
PCA_centroids[,2],
pch = 20,
col = as.factor(classifier[["ylabs"]]),
cex = 1.5)
```
# Parameter optimization
In the previous scenario we have performed the classification using the PCA projection, with 50 principal components. However we have not checked how the performance of the model (in our case we have seen the classification rate) is affected by changing the number of principal components.
```{r}
# We will try using 1 to 100 principal components:
max_pca_ncomp <- 100
```
We can not simply change the number of principal components above and check the classification rate. If we do that, we will be using the test subset to select the best parameters of the model, so we will not have an actual blind test subset of samples.
As explained in class, the right approach is to split the train subset into two groups, one used for training the models with different number of principal components, and the other used to test the models and choose the one with best classification rate. Then the best model can be used on the blind test samples to have an estimation of the best model performance.
With so many partitions, the number of samples available for training the models starts to be too small. To ensure that our results are robust to the random variability of how we split the train subset, we perform the computation several times, in several iterations, using a cross validation method.
We will use a k-fold validation and split the train subset into 4 groups. For each fold, for each number of principal components we will train the PCA-knn models and compute its classification rate. We will obtain 4 values (one for each fold) of the classification rate for each number of principal components.
```{r}
# To perform the cross-validation we only use the TRAIN subset, that we # divide into cv_train and cv_test subsets.
# Partition the train subset into 4 groups:
kfolds <- 4
pca_idx_cv <- caret::createFolds(y = pca_idx_train, k = kfolds, returnTrain = TRUE)
bph_idx_cv <- caret::createFolds(y = bph_idx_train, k = kfolds, returnTrain = TRUE)
ctrl_idx_cv <- caret::createFolds(y = ctrl_idx_train, k = kfolds, returnTrain = TRUE)
classification_rates <- matrix(0, nrow = kfolds, ncol = max_pca_ncomp)
for (iter in 1:kfolds){
cv_train_idx <- c(pca_idx_train[pca_idx_cv[[iter]]],
bph_idx_train[bph_idx_cv[[iter]]], ctrl_idx_train[ctrl_idx_cv[[iter]]])
cv_test_idx <- setdiff(train_idx, cv_train_idx)
# Get the cv_train and cv_test matrices, with their labels:
intensity_cv_trn <- intensity[cv_train_idx,]
intensity_cv_tst <- intensity[cv_test_idx,]
subject_type_cv_trn <- subject_type[cv_train_idx]
subject_type_cv_tst <- subject_type[cv_test_idx]
# Preprocessing, like above:
intensity_cv_trn_preproc <- scale(intensity_cv_trn, center = TRUE, scale = FALSE)
mean_spectrum_cv_trn <- attr(intensity_cv_trn_preproc, 'scaled:center')
intensity_cv_tst_preproc <- scale(intensity_cv_tst, center = mean_spectrum_cv_trn, scale = FALSE)
# Train PCA
pca_cv_model <- ChemometricsWithR::PCA(intensity_cv_trn_preproc)
for (pca_model_ncomp_cv in 1:100) {
# Get the PCA scores for the cv_train and cv_test subsets:
intensity_scores_cv_trn <- ChemometricsWithR::scores(pca_cv_model, npc = pca_model_ncomp_cv)
intensity_scores_cv_test <- ChemometricsWithR::project(pca_cv_model, npc = pca_model_ncomp_cv, intensity_cv_tst_preproc)
# Train the knn with the PCA scores, using only the cv_train subset
## FALLA
subject_type_tst_pca_knn_pred <- class::knn(train = intensity_scores_cv_trn,
test = intensity_scores_cv_test,
cl = subject_type_cv_trn, k = 5)
confmat_pca_knn <- table(subject_type_cv_tst, subject_type_tst_pca_knn_pred)
CR_pca_knn_cv <- sum(diag(confmat_pca_knn))/sum(confmat_pca_knn)
message("The classification rate for PCA-kNN is: ", 100*round(CR_pca_knn_cv, 2), "%")
# Store the classification rate for this k-fold iteration and this number of principal components:
classification_rates[iter, pca_model_ncomp_cv] <- CR_pca_knn_cv
}
}
```
```{r}
head(as.data.frame(classification_rates))
```
```{r}
# Plot the classification rates obtained on each k-fold iteration for all # the principal components tested:
matplot(x = 1:100, y = t(classification_rates), type = "l",
col = c("red", "darkgreen", "blue", "black"), lty = "solid",
xlab = "Number of principal components", ylab = "Classification rate (%)")
legend("bottomright", legend = c("It. 1", "It. 2", "It. 3", "It. 4"),
col = c("red", "darkgreen", "blue", "black"), lty = "solid")
# From the plot above we can choose the number of principal components that we prefer
# and build the final model again. (50 was a fine value)
# Our estimation of the model performance choosing 50 principal components can # be represented as a single point in the plot:
points(x = pca_model_ncomp, y = CR_pca_knn, col = "black", cex = 2, pch = 21, bg = "black")
pca_model_ncomp
CR_pca_knn
```
**Further questions**
What would have happened if the researchers had not considered to include benign prostatic hypertrophy subjects in the experimental design?
What would be our prediction if we built a model without benign prostatic hypertrophy samples and later on we tried to predict a patient who suffered from that condition?
How do you compare the prediction in internal validation and in external validation. Surprisingly it seems results are better in external validation than in internal validation. Could you reason why this can happen?
In this case, dimensionality reduction does not improve the performance of the classifier. Do you think this behavior could repeat in classifiers of higher complexity?