Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To have in input a vector of string of NMRExperiments #62

Open
philibe opened this issue Apr 22, 2022 · 9 comments
Open

To have in input a vector of string of NMRExperiments #62

philibe opened this issue Apr 22, 2022 · 9 comments
Assignees

Comments

@philibe
Copy link

philibe commented Apr 22, 2022

Hello.

Thank you very much for your powerful Nmr package, well documented with examples, integrated, and not too complex to use.

I saw that you have implemented Ensure unique exp names pull #48, it's useful, but when we have >100 experiments for a PCA, for example, it's not easy to analyze "10...54" experiment name and not the real name of experiment.

Here is my workaround :) (written before the pull #48 fix, but still useful for renaming duplicate names)

# my analyses (my_dir_of_NMR_analysis)
#   analysis L1 bla bla
#     10 : my_experiments[[1]]
#   analysis L2 bla bla
#     10 : my_experiments[[2]]
#     20 : my_experiments[[3]]

my_experiments<- fs::dir_ls( 
  my_dir_of_NMR_analysis   , 
  type="directory", 
  recurse=TRUE, 
  glob="*0"
)

dataset<- AlpsNMR::nmr_read_samples(my_experiments) 

dataset$NMRExperiment<-paste0(
  basename(dirname(dataset$metadata$info$info_sample_path)),
  "_exp_", 
  basename(dataset$metadata$info$info_sample_path) 
)

for (metadata_name_niv1 in names(dataset$metadata)) {
  for (metadata_name_niv2 in names(dataset$metadata[[metadata_name_niv1]])) {
    if (DescTools::Coalesce(dplyr::contains ("NMRExperiment", ignore.case = TRUE,metadata_name_niv2),0)==1) { 
      dataset$metadata[[metadata_name_niv1]][[metadata_name_niv2]]<- dataset$NMRExperiment
    }
  } 
}    
@zeehio
Copy link
Member

zeehio commented Apr 24, 2022

Thanks for your feedback! If you have your samples from different groups in different folders then that's a way to name them, but not everyone uses that schema and some experiment designs may be more complex and not take care of naming so easily I guess...

I am currently heavily working on improving some parts of the package. Naming samples is important and we need to provide a convenient solution for everyone.

I could easily provide a function to set the sample names where you just give the sample names as a character vector and then AlpsNMR replaces the names where needed.

I could provide an additional helper to allow the renaming of the sample based on the sample path. I will explore some options, provide you some solutions and if they work for you then I will merge them for everyone. I hope to find the time to implement this soon.

@zeehio zeehio self-assigned this Apr 24, 2022
@zeehio
Copy link
Member

zeehio commented Jun 23, 2022

I've implemented two functions that will be helpful:

  • names(<nmr_dataset>) allows you to get/set sample names
  • nmr_read_samples(....) has a smarter approach to set the default sample names.

For instance, if the sample names are repeated, in different subfolders, such as:

- your_dataset/
  + control/
     * 10/
     * 20/
     * 30/
  + mutated/
     * 10/
     * 20/
     * 30/
``

Using:

```r
dataset <- nmr_read_samples_dir(c("your_dataset/control", "your_dataset/mutated"))
dataset

Will give you as sample names control/10, control/20, control/30, mutated/10, mutated/20, mutated/30.

I hope this behaves as you expect, you can always choose your own names, with names()<-

I will close this issue in a few days. Feel free to comment or close it yourself if you like.

Have a nice day!

@philibe
Copy link
Author

philibe commented Jun 23, 2022

Thanks. I will look your new feature next week. :)

@philibe
Copy link
Author

philibe commented Jun 27, 2022

I cannot upgrade AlpsNMR now because I have not yet R >=4.2 (R 4.1.2).

I will look in the future.

  • names(<nmr_dataset>) could be useful, if I can use use with the same feature as my for example.
  • nmr_read_samples_dir(c("your_dataset/control", "your_dataset/mutated")) could be useful but not for me now :) See below.

In fact I have many folders of Bruker datasets, each folder is different. I have never the same folder like control or dataset. My wish would have been like my for example below:

my analyses (my_dir_of_NMR_analysis)
analysis L1 bla bla
10 : my_experiments[[1]] , new NMRExperiment name in AlpsNMR : analysis L1 bla bla_10
analysis L2 bla bla
10 : my_experiments[[2]], new NMRExperiment name in AlpsNMR : analysis L2 bla bla_10
20 : my_experiments[[3]], new NMRExperiment name in AlpsNMR : analysis L2 bla bla_20

I see that you prefer to close my issue, even if I cannot test, and you prefer to reopen or create a new one to not have too many opened issues during months and months :)

Thank you anyway for your work.

@zeehio
Copy link
Member

zeehio commented Jun 27, 2022

I'll try to relax the 4.2 dependency to 4.1.

Besides, would something like:

dataset <- nmr_read_samples_dir(list.dirs("your_dataset_dir/", recursive=FALSE))

list.dirs() would list all your analysis bla folders.

Work that work for you?

@philibe
Copy link
Author

philibe commented Jun 27, 2022

nmr_read_samples_dir(list.dirs("your_dataset_dir/", recursive=FALSE)) : yes : It's my use case :)

@zeehio
Copy link
Member

zeehio commented Jun 28, 2022

The package seems to work on R 4.1 as well. Feel free to try it

@philibe
Copy link
Author

philibe commented Jun 29, 2022

Ok thanks. I installed it.

But I think we don't understand each other :D

See below for a new explanation of wish :)

Issue of names(<nmr_dataset>)

my_list<- fs::dir_ls( ROOT_DIRECTORY_ANALYSIS, type="directory", recurse=0)
dataset<-NULL
dataset<-AlpsNMR::nmr_read_samples_dir(
    my_list
)
names(dataset[1])<- my_list[1]
Warning message in names(dataset[1]) <- my_list[1]:number of items to replace is not a multiple of replacement length

It doesn't recurse into each NmrExperiment or infoNmrExp to rename it. And it's normal, it's a vector.

Issue of nmr_read_samples_dir(list.dirs("your_dataset_dir/", recursive=FALSE))

my_list<- fs::dir_ls( ROOT_DIRECTORY_ANALYSIS, type="directory", recurse=0)
nmr_read_samples_dir(my_list)
New names:
* `1` -> `1...1`
* `10` -> `10...2`
* `1` -> `1...3`
* `10` -> `10...4`
* `1` -> `1...5`

New explanation :)

Here is my actual tree of directories. The RMN scientific where I work says it's not usual to work like we work.
We work to have one experiment by analysis to have only one spectrum by analysis.

directories

/mnt/rmn  <- ROOT_DIRECTORY_ANALYSIS
- /mnt/rmn/Flower1 <- bruker directory
   +  10 : Automatic name of NmrExperiment
       * pdata
       * acqus, fid, etc..

- /mnt/rmn/Flower2 <- bruker directory
   +  10 : Automatic name of NmrExperiment
       * pdata
       * acqus, fid, etc..       
- /mnt/rmn/Flower3 <- bruker directory
   +  10 : Automatic name of NmrExperiment
       * pdata
       * acqus, fid, etc..       

AlpsNMR Structure

to have in your structure, for example in the first and second one

dataset[1]
str(dataset[1])
List of 4
 $ metadata   :List of 7
  ..$ info    : tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment      : chr "10...1"  => to be renamed "Flower1_10"
  .. ..$ info_NMRExperiment : chr "10" => to be renamed "Flower1_10"
  .. ..$ info_file_format   : chr "Bruker NMR directory"
  .. ..$ info_sample_path   : chr "/mnt/rmn/Flower1/10"
  .. ..$ info_dimension     : int 1

  ..$ orig    : tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...1" => to be renamed "Flower1_10"
  ..$ title   : tibble [1 × 28] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment                 : chr "10...1"
  .. ..$ title_V1                      : chr "m=758.9mg eq 750ul"

  ..$ acqus   : tibble [1 × 239] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment     : chr "10...1" => to be renamed "Flower1_10"
  .. ..$ acqus_TITLE       : chr "Parameter file, TopSpin 3.5 pl 6"
  ..$ procs   : tibble [1 × 137] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment     : chr "10...1" => to be renamed "Flower1_10"

  ..$ levels  : tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...1" => to be renamed "Flower1_10"
  .. ..$ levels_levels: logi NA
  ..$ external: tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...1" => to be renamed "Flower1_10"
 $ data_1r    :List of 1
  ..$ : num [1:131072] -175 257 686 875 947 ...
 $ axis       :List of 1
  ..$ :List of 1
  .. ..$ : num [1:131072] 15.1 15.1 15.1 15.1 15.1 ...
 $ num_samples: int 1
 - attr(*, "class")= chr [1:2] "nmr_dataset" "nmr_dataset_family"
dataset[2]
str(dataset[2])
List of 4
 $ metadata   :List of 7
  ..$ info    : tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment      : chr "10...2"  => to be renamed "Flower2_10"
  .. ..$ info_NMRExperiment : chr "10" => to be renamed "Flower2_10"
  .. ..$ info_file_format   : chr "Bruker NMR directory"
  .. ..$ info_sample_path   : chr "/mnt/rmn/Flower2/10"
  .. ..$ info_dimension     : int 1

  ..$ orig    : tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...2" => to be renamed "Flower2_10"
  ..$ title   : tibble [1 × 28] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment                 : chr "10...2"
  .. ..$ title_V1                      : chr "m=758.9mg eq 750ul"

  ..$ acqus   : tibble [1 × 239] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment     : chr "10...2" => to be renamed "Flower2_10"
  .. ..$ acqus_TITLE       : chr "Parameter file, TopSpin 3.5 pl 6"
  ..$ procs   : tibble [1 × 137] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment     : chr "10...2" => to be renamed "Flower2_10"

  ..$ levels  : tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...2" => to be renamed "Flower2_10"
  .. ..$ levels_levels: logi NA
  ..$ external: tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. ..$ NMRExperiment: chr "10...2" => to be renamed "Flower2_10"
 $ data_1r    :List of 1
  ..$ : num [1:131072] -175 257 686 875 947 ...
 $ axis       :List of 1
  ..$ :List of 1
  .. ..$ : num [1:131072] 15.1 15.1 15.1 15.1 15.1 ...
 $ num_samples: int 1
 - attr(*, "class")= chr [1:2] "nmr_dataset" "nmr_dataset_family"

Except that, AlpsNMR works

But except the issue (only for us) about the names of NmrExperiments, we are used to using AlpsNMR in particular nmr_read_samples_dir(), nmr_read_samples(), nmr_meta_add(), nmr_interpolate_1D(),nmr_pca_outliers_robust(), nmr_pca_outliers_plot(), plot(dataset) etc etc :) But with curves of spectrums or points with many many of their names like "10..1" "10..2" "10..50". So my loop of renaming at the beginning of this issue.

@zeehio
Copy link
Member

zeehio commented Jul 8, 2022

I will prepare some unit tests to ensure your directory structure gives sensible names.

Meanwhile, I would expect something like this to work:

my_list<- fs::dir_ls( ROOT_DIRECTORY_ANALYSIS, type="directory", recurse=0)
ds <- nmr_read_samples_dir(my_list)
names(ds) <- c("Flower1", "Flower2", "Flower3")
# Or:
# names(ds) <- basename(my_list)

But I will do further testing. I'm writing this from my phone right now :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants