Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade of FactomineR breaks the tools gsc_high_dimentionnal_visualisation #649

Open
drosofff opened this issue Oct 20, 2023 · 3 comments
Open

Comments

@drosofff
Copy link
Member

Current situation

The tool has the following dependencies and passes all tests

        <requirement type="package" version="1.6.2=r35h6115d3f_0">r-optparse</requirement>
        <requirement type="package" version="1.42=r35h6115d3f_0">r-factominer</requirement>
        <requirement type="package" version="1.0.5">r-factoextra</requirement>
        <requirement type="package" version="0.15=r351he1b5a44_0">r-rtsne</requirement>
        <requirement type="package" version="0.4.7=r351h6115d3f_0">r-ggfortify</requirement>
        <requirement type="package" version="1.1.9=r351h0357c0b_0">r-clusterr</requirement>
        <requirement type="package" version="1.2.5=r35h6115d3f_0">r-polychrome</requirement>

with upgrade

the #648 PR upgrade, the R environment is turned into 4.3.1 with the following conda packaging

        <requirement type="package" version="1.7.3=r43hc72bb7e_2">r-optparse</requirement>
        <requirement type="package" version="2.9=r43h57805ef_0">r-factominer</requirement>
        <requirement type="package" version="1.0.7=r43hc72bb7e_3">r-factoextra</requirement>
        <requirement type="package" version="0.16=r43h7ce84a7_2">r-rtsne</requirement>
        <requirement type="package" version="0.4.16=r43hc72bb7e_1">r-ggfortify</requirement>
        <requirement type="package" version="1.3.1=r43h08d816e_1">r-clusterr</requirement>
        <requirement type="package" version="1.5.1=r43hc72bb7e_2">r-polychrome</requirement>

I have adapted the code for testing in Rstudio, and could find that there is a problem with the plot.PCA wrapped function of factominer.

Typically,

  if (opt$labels == FALSE) {
    plot(pca, axes = c(opt$PCA_x_axis, opt$PCA_y_axis), label = "none", col.ind = factor_cols)
    } else {
    plot(pca, axes = c(opt$PCA_x_axis, opt$PCA_y_axis), cex = 0.2, col.ind = factor_cols)
  }

return errors of type

>     plot(pca, axes = c(opt$PCA_x_axis, opt$PCA_y_axis), label = "none", col.ind = rep("skyblue", 1033))
Error in `geom_point()`:
! Problem while setting up geom aesthetics.
ℹ Error occurred in the 3rd layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1033)
✖ Fix the following mappings: `colour`
ℹ Error occurred in the 3rd layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1033)
✖ Fix the following mappings: `colour`
---
Backtrace:
     ▆
  1. ├─base (local) `<fn>`(x)
  2. └─ggplot2:::print.ggplot(x)
  3.   ├─ggplot2::ggplot_build(x)
  4.   └─ggplot2:::ggplot_build.ggplot(x)
  5.     └─ggplot2:::by_layer(...)
  6.       ├─rlang::try_fetch(...)
  7.       │ ├─base::tryCatch(...)
  8.       │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  9.       │ │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 10.       │ │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
 11.       │ └─base::withCallingHandlers(...)
 12.       └─ggplot2 (local) f(l = layers[[i]], d = data[[i]])
 13.         └─l$compute_geom_2(d)
 14.           └─ggplot2 (local) compute_geom_2(..., self = self)
 15.             └─self$geom$use_defaults(data, self$aes_params, modifiers)
 16.               └─ggplot2 (local) use_defaults(..., self = self)
 17.                 └─ggplot2:::check_aesthetics(params[aes_params], nrow(data))

Help !

@bellenger-l I could detemine that the faulty parameter is col.ind but coud not figure out why. Ok, there is a problem of lenght mapping, but even if you manually force the factor_cols to have the length asked by R (1033), it is still poping up the same complaint !

Could you look at this and help me to fix, @bellenger-l ?

I have also noted the existence of other possible parameter like habillage and col.hav or col.var, etc...

last but not least, in this code there is something "odd":

ok for the transposition of the dataframe ! cells are observations and gene are the real variables.
But then, why are the number of colors of observations is determined by (for instance) line 328:

factor_cols <- rep("deepskyblue4", length(rownames(data)))

since at this (early) stage of the script, columns are still observations ????

By the way, as you see, factomineR was (very) recently upgraded !!

@drosofff
Copy link
Member Author

For convenience, here is the code adapted to Rstudio (getting rid of optparse stuff)

# libraries
library(FactoMineR)
library(factoextra)
library(Rtsne)
library(ggplot2)
library(ggfortify)
library(RColorBrewer)
library(ClusterR)
library(data.table)
library(Polychrome)

# manual options
opt <- list()

opt$data <- "cpm_input.tsv"
opt$sep <- "\t"
opt$colnames <- TRUE
opt$out <- "res.tab"
opt$labels <- FALSE
opt$factor <- ""  # "numeric_factor.tsv"
opt$visu_choice <- "PCA"
opt$table_coordinates <- ""
opt$Rtsne_seed <- 42
opt$Rtsne_dims <- 2
opt$Rtsne_initial_dims <- 50
opt$Rtsne_perplexity <- 5.0
opt$Rtsne_theta <- 1.0
opt$Rtsne_max_iter <- 1000
opt$Rtsne_pca <- TRUE
opt$Rtsne_pca_center <- TRUE
opt$Rtsne_pca_scale <- FALSE
opt$Rtsne_normalize <- TRUE
opt$Rtsne_exaggeration_factor <- 12.0
opt$PCA_npc <- 5
opt$PCA_x_axis <- 1
opt$PCA_y_axis <- 2
opt$HCPC_ncluster <- -1
opt$HCPC_npc <- 5
opt$HCPC_metric <- "euclidean"
opt$HCPC_method <- "ward"
opt$pdf_out <- "pca.nolabels.pdf"  # "pca.nolabels.numerical-factor.pdf" "pca.labels.pdf"
opt$HCPC_consol <- TRUE
opt$HCPC_itermax <- 10
opt$HCPC_min <- 3
opt$HCPC_max <- -1
opt$HCPC_clusterCA <- "rows"
opt$HCPC_kk <- Inf
opt$HCPC_clust <- ""
opt$HCPC_mutual_info <- ""
opt$HCPC_cluster_description <- ""

if (opt$sep == "tab") {
  opt$sep <- "\t"
}
if (opt$sep == "comma") {
  opt$sep <- ","
}
if (opt$HCPC_max == -1) {
  opt$HCPC_max <- NULL
}
if (opt$HCPC_kk == -1) {
  opt$HCPC_kk <- Inf
}

##Add legend to plot()
legend_col <- function(col, lev) {
  
  opar <- par
  
  n <- length(col)
  
  bx <- par("usr")
  
  box_cx <- c(bx[2] + (bx[2] - bx[1]) / 1000,
              bx[2] + (bx[2] - bx[1]) / 1000 + (bx[2] - bx[1]) / 50)
  box_cy <- c(bx[3], bx[3])
  box_sy <- (bx[4] - bx[3]) / n
  
  xx <- rep(box_cx, each = 2)
  
  par(xpd = TRUE)
  for (i in 1:n) {
    yy <- c(box_cy[1] + (box_sy * (i - 1)),
            box_cy[1] + (box_sy * (i)),
            box_cy[1] + (box_sy * (i)),
            box_cy[1] + (box_sy * (i - 1)))
    polygon(xx, yy, col = col[i], border = col[i])
  }
  par(new = TRUE)
  plot(0, 0, type = "n",
       ylim = c(min(lev), max(lev)),
       yaxt = "n", ylab = "",
       xaxt = "n", xlab = "",
       frame.plot = FALSE)
  axis(side = 4, las = 2, tick = FALSE, line = .25)
  par <- opar
}

data <- read.delim(
  opt$data,
  check.names = FALSE,
  header = opt$colnames,
  row.names = 1,
  sep = opt$sep
)

# Contrasting factor and its colors
if (opt$factor != "") {
  contrasting_factor <- read.delim(
    opt$factor,
    header = TRUE
  )
  rownames(contrasting_factor) <- contrasting_factor[, 1]
  contrasting_factor <- contrasting_factor[colnames(data), ]
  colnames(contrasting_factor) <- c("id", "factor")
  if (is.numeric(contrasting_factor$factor)) {
    factor_cols <- rev(brewer.pal(n = 11, name = "RdYlGn"))[contrasting_factor$factor]
  } else {
    contrasting_factor$factor <- as.factor(contrasting_factor$factor)
    if (nlevels(contrasting_factor$factor) == 2) {
      colors_labels <- c("#E41A1C", "#377EB8")
    } else {
      set.seed(567629)
      colors_labels <- createPalette(nlevels(contrasting_factor$factor), c("#5A5156", "#E4E1E3", "#F6222E"))
      names(colors_labels) <- NULL
    }
    factor_colors <-
      with(contrasting_factor,
           data.frame(factor = levels(contrasting_factor$factor),
                      color = I(colors_labels)
           )
      )
    factor_cols <- factor_colors$color[match(contrasting_factor$factor,
                                             factor_colors$factor)]
  }
} else {
  factor_cols <- rep("deepskyblue4", length(rownames(data)))
}

################  t-SNE ####################
if (opt$visu_choice == "tSNE") {
  # filter and transpose df for tsne and pca
  tdf <- t(data)
  # make tsne and plot results
  set.seed(opt$Rtsne_seed) ## Sets seed for reproducibility
  
  tsne_out <- Rtsne(tdf,
                    dims = opt$Rtsne_dims,
                    initial_dims = opt$Rtsne_initial_dims,
                    perplexity = opt$Rtsne_perplexity,
                    theta = opt$Rtsne_theta,
                    max_iter = opt$Rtsne_max_iter,
                    pca = opt$Rtsne_pca,
                    pca_center = opt$Rtsne_pca_center,
                    pca_scale = opt$Rtsne_pca_scale,
                    normalize = opt$Rtsne_normalize,
                    exaggeration_factor = opt$Rtsne_exaggeration_factor)
  
  embedding <- as.data.frame(tsne_out$Y[, 1:2])
  embedding$Class <- as.factor(rownames(tdf))
  gg_legend <- theme(legend.position = "right")
  if (opt$factor == "") {
    ggplot(embedding, aes(x = V1, y = V2)) +
      geom_point(size = 1, color = "deepskyblue4") +
      gg_legend +
      xlab("t-SNE 1") +
      ylab("t-SNE 2") +
      ggtitle("t-SNE") +
      if (opt$labels) {
        geom_text(aes(label = Class), hjust = -0.2, vjust = -0.5, size = 1.5, color = "deepskyblue4")
      }
  } else {
    if (is.numeric(contrasting_factor$factor)) {
      embedding$factor <- contrasting_factor$factor
    } else {
      embedding$factor <- as.factor(contrasting_factor$factor)
    }
    
    ggplot(embedding, aes(x = V1, y = V2, color = factor)) +
      geom_point(size = 1) +
      gg_legend +
      xlab("t-SNE 1") +
      ylab("t-SNE 2") +
      ggtitle("t-SNE") +
      if (opt$labels) {
        geom_text(aes(label = Class, colour = factor), hjust = -0.2, vjust = -0.5, size = 1.5)
      }
  }
  ggsave(file = opt$pdf_out, device = "pdf")
  
  #save coordinates table
  if (opt$table_coordinates != "") {
    coord_table <- cbind(rownames(tdf), round(as.data.frame(tsne_out$Y), 6))
    colnames(coord_table) <- c("Cells", paste0("DIM", (1:opt$Rtsne_dims)))
  }
}


######### make PCA with FactoMineR #################
if (opt$visu_choice == "PCA") {
  pca <- PCA(t(data), ncp = opt$PCA_npc, graph = FALSE)
  pdf(opt$pdf_out)
  if (opt$labels == FALSE) {
    plot(pca, axes = c(opt$PCA_x_axis, opt$PCA_y_axis), label = "none", col.ind = factor_cols)
  } else {
    plot(pca, axes = c(opt$PCA_x_axis, opt$PCA_y_axis), cex = 0.2, col.ind = factor_cols)
  }
  if (opt$factor != "") {
    if (is.factor(contrasting_factor$factor)) {
      legend(x = "topright",
             legend = as.character(factor_colors$factor),
             col = factor_colors$color, pch = 16, bty = "n", xjust = 1, cex = 0.7)
    } else {
      legend_col(col = rev(brewer.pal(n = 11, name = "RdYlGn")), lev = cut(contrasting_factor$factor, 11, label = FALSE))
    }
  }
  dev.off()
  
  #save coordinates table
  if (opt$table_coordinates != "") {
    coord_table <- cbind(rownames(pca$ind$coord), round(as.data.frame(pca$ind$coord), 6))
    colnames(coord_table) <- c("Cells", paste0("DIM", (1:opt$PCA_npc)))
  }
  
}

########### make HCPC with FactoMineR ##########
if (opt$visu_choice == "HCPC") {
  
  # HCPC starts with a PCA
  pca <- PCA(
    t(data),
    ncp = opt$HCPC_npc,
    graph = FALSE,
  )
  
  pca_ind_coord <- as.data.frame(pca$ind$coord) # coordinates of observations in PCA
  
  # Hierarchical Clustering On Principal Components Followed By Kmean Clustering
  res_hcpc <- HCPC(pca,
                   nb.clust = opt$HCPC_ncluster, metric = opt$HCPC_metric, method = opt$HCPC_method,
                   graph = FALSE, consol = opt$HCPC_consol, iter.max = opt$HCPC_itermax, min = opt$HCPC_min, max = opt$HCPC_max,
                   cluster.CA = opt$HCPC_clusterCA, kk = opt$HCPC_kk)
  # HCPC plots
  dims <- head(as.data.frame(res_hcpc$call$t$res$eig), 2) # dims variances in column 2
  pdf(opt$pdf_out)
  plot(res_hcpc, choice = "tree")
  plot(res_hcpc, choice = "bar")
  plot(res_hcpc, choice = "3D.map")
  if (opt$labels == FALSE) {
    plot(res_hcpc, choice = "map", label = "none")
  } else {
    plot(res_hcpc, choice = "map")
  }
  
  # user contrasts on the pca
  if (opt$factor != "") {
    plot(pca, label = "none", col.ind = factor_cols)
    if (is.factor(contrasting_factor$factor)) {
      legend(x = "topright",
             legend = as.character(factor_colors$factor),
             col = factor_colors$color, pch = 16, bty = "n", xjust = 1, cex = 0.7)
      
      ## Normalized Mutual Information
      sink(opt$HCPC_mutual_info)
      res <- external_validation(
        true_labels = as.numeric(contrasting_factor$factor),
        clusters = as.numeric(res_hcpc$data.clust$clust),
        summary_stats = TRUE
      )
      sink()
      
    } else {
      legend_col(col = rev(brewer.pal(n = 11, name = "RdYlGn")), lev = cut(contrasting_factor$factor, 11, label = FALSE))
    }
  }
  
  dev.off()
  
  if (opt$table_coordinates != "") {
    coord_table <- cbind(Cell = rownames(res_hcpc$call$X),
                         round(as.data.frame(res_hcpc$call$X[, -length(res_hcpc$call$X)]), 6),
                         as.data.frame(res_hcpc$call$X[, length(res_hcpc$call$X)])
    )
    colnames(coord_table) <- c("Cells", paste0("DIM", (1:opt$HCPC_npc)), "Cluster")
  }
  
  if (opt$HCPC_clust != "") {
    res_clustering <- data.frame(Cell = rownames(res_hcpc$data.clust),
                                 Cluster = res_hcpc$data.clust$clust)
    
  }
  
  # Description of cluster by most contributing variables / gene expressions
  
  # first transform list of vectors in a list of dataframes
  extract_description <- lapply(res_hcpc$desc.var$quanti, as.data.frame)
  # second, transfer rownames (genes) to column in the dataframe, before rbinding
  extract_description_w_genes <- Map(cbind,
                                     extract_description,
                                     genes = lapply(extract_description, rownames)
  )
  # Then use data.table to collapse all generated dataframe, with the cluster id in first column
  # using the {data.table} rbindlist function
  cluster_description <- rbindlist(extract_description_w_genes, idcol = "cluster_id")
  cluster_description <- cluster_description[, c(8, 1, 2, 3, 4, 5, 6, 7)] # reorganize columns
  
  
  # Finally, output cluster description data frame
  write.table(
    cluster_description,
    file = opt$HCPC_cluster_description,
    sep = "\t",
    quote = FALSE,
    col.names = TRUE,
    row.names = FALSE
  )
  
}

## Return coordinates file to user

if (opt$table_coordinates != "") {
  write.table(
    coord_table,
    file = opt$table_coordinates,
    sep = "\t",
    quote = FALSE,
    col.names = TRUE,
    row.names = FALSE
  )
}


if (opt$HCPC_clust != "") {
  write.table(
    res_clustering,
    file = opt$HCPC_clust,
    sep = "\t",
    quote = FALSE,
    col.names = TRUE,
    row.names = FALSE
  )
}

@bellenger-l
Copy link
Member

I think I found it !!

It seems that parameters of plot.PCA have changed.

  • col.ind must be a single color
  • col.hab is the vector of color BUT it must be combined with the argument habillage = "ind" because it's "none" by default

I don't understand why there is factor_cols <- rep("deepskyblue4", length(rownames(data))), it might be an error

@drosofff
Copy link
Member Author

yeah ! I saw the same this week end. In any case, I will completely refactor the code for PCA by this tool, because we were not using factomineR properly ! quantitative as well as qualitative factor mais be added since the beginning in the input data set and specified when the PCA function is called. From this, visualisation of results (based on ggplot2) is just a piece of cake with the PCA.plot function !

I did not have time to see, but I guess it should be the same for HCPC !

And yes, factor_cols <- rep("deepskyblue4", length(rownames(data))) is a genuine error !... but we don't care anymore since factomineR is managing this for us !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants