-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential fixes for issue #40 (efficient sleuth object storage) #63
Conversation
Some stuff that breaks with the current update: summaries/fragment length distribution plot – "Error: kallisto object does not contain the fragment length distribution. Please rerun with a new version of kallisto." (can't tell whether this is due to my branch changes, or due to a sample set actually run with an older version of kallisto, more updates to come) Update: looks like only analyses/transcript view is broken! |
@@ -7,4 +7,4 @@ linters: with_defaults( | |||
single_quotes_linter = NULL, | |||
trailing_blank_lines_linter = NULL | |||
) | |||
exclusions: list("R/hexamers.R", "inst/doc/intro.R") | |||
exclusions: list("R/hexamers.R", "inst/doc/intro.html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change? Was this giving you issue? The linter shouldn't be checking anything other than .R files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inst/doc/intro.R" does not exist – the intro file is an html file.
I added intro.html to the exclusions since I assumed it shouldn't be checking
the html file (and I assumed that exclusions was a list of files lintr shouldn't check)
FYI, I'm planning to do a thorough code review Thursday night. |
Last commit turned out to be calculating the bootstrap quantiles incorrectly. This commit should address that issue, and also correctly plot the boxplots of each transcript.
Hey Harold, I want to tentatively claim that everything is working. Take a look! |
bs_var | ||
})) | ||
|
||
ret$target_id <- target_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be returned in the object?
obs_counts <- obs_to_matrix(ret, "est_counts") | ||
obs_counts <- transformation_function(obs_counts) | ||
|
||
bs_test_summary$obs_counts <- obs_counts[rownames(obs_counts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it just me, or are these filter commands a bit obscure? are they not simply the same filter command?
bs_test_summary <- adf(ret$bs_summary) | ||
bs_test_summary$target_id <- target_id | ||
bs_test_summary <- bs_test_summary[order(bs_test_summary$target_id), ] | ||
bs_test_summary <- data.frame(varMeans = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that you perform some operations on this, I think it makes sense to leave it as a matrix until the last possible moment. please re factor so that you can perform all operations on the matrix, then simply have a very basic command that turns it into a data frame and adds the correct columns (target_id
)
okay, I did a very quick code review and made some minor changes. I haven't yet tested for correctness -- I will do so after you make these changes. I will probably have some minor changes after that, then I think we will be ready to merge this into the main branch. good work! also: please use 2 space tabs rather than 4. thanks! |
Hey Harold, I've been slowly working on new updates. In sleuth_live transcript view, can you justify having the different units for the boxplots? Won't the trends will be the same regardless of units? |
There are a number of caveats here, and potentially I would like to leave it as an option. Can you add an option to Thanks |
Done. The code now uses a pre-allocated matrix, does not repeat calculations, and uses for-loops. I've also fixed up some minor style problems. |
Awesome, thanks! Can you also address the comment on: https://github.com/pachterlab/sleuth/pull/63/files#diff-f7d482db290842229ad430c971048216R335 (sleuth.R line 335)? |
read_bootstrap_mat <- function(fname, num_bootstraps, num_transcripts, est_count_sf) { | ||
bs_mat <- matrix(nrow=num_bootstraps, ncol=num_transcripts) | ||
for(i in 1:nrow(bs_mat)) { | ||
bs_mat[i, ] <- rhdf5::h5read(fname, paste0("bootstrap/bs", i - 1)) / est_count_sf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: est_count_sf
usage does not look correct here. I am fairly certain it should be est_count_sf[i]
. should check for correctness. UPDATE: probably okay due to the way it is called
@psturmfels I'm having trouble running this version. Here is the error that I get:
This is the code that generated that error: library('devtools')
dev_mode()
install('../..')
library('sleuth')
data_path <- 'ellahi'
sample_ids <- grep('^SR', dir(data_path), value = TRUE)
sample_ids <- rev(sample_ids)
study_mapping <- read.table(file.path(data_path, 'study_design.txt'), header = TRUE,
stringsAsFactors = FALSE)
study_mapping <- dplyr::select(study_mapping, sample = run, condition)
stopifnot(sample_ids == study_mapping$sample)
result_paths <- file.path(data_path, sample_ids, 'kallisto')
study_mapping <- dplyr::mutate(study_mapping, path = result_paths)
so <- sleuth_prep(study_mapping, ~condition) Run from within Can you look into it, please? Thanks |
Hey Harold, With that said, I'll change the code not to rely on this assumption. |
@psturmfels I think I am comfortable merging this now. Take a look at the recent commits/modifications that I made so that you are familiar with them. In particular, I cleaned up a lot of the staff at the end (when assigning things to Anyway, I checked for correctness on 2 data sets and they seem to give the exact same values. Good work! |
Potential fixes for issue #40 (efficient sleuth object storage)
No description provided.