Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental resolving packages #204

Merged
merged 6 commits into from
Feb 12, 2020

Conversation

renkun-ken
Copy link
Member

@renkun-ken renkun-ken commented Feb 10, 2020

Closes #203

This PR implements incremental package resolution so that resolving are done if and only if there are new packages appearing in the document that are not loaded in workspace.

This reduces all unnecessary package resolving on didOpen and didSave, significantly reduces the time of parse task in most cases, and also allows live loading packages on document didChange without having to didSave (e.g. editing a temporary untitled document not intended to save on disk)

@randy3k
Copy link
Member

randy3k commented Feb 10, 2020

Do we actually want to keep the information of loaded libraries in document$parse_data? In this way, when a document is closed, we could remove those loaded libraries from the workspace via a new function workspace$close_document.

@renkun-ken
Copy link
Member Author

Suppose we opened two documents using the following libraries:

Document 1:

library(foo) # which causes library(bar)

Document 2:

library(bar)

Then document1's loaded libraries should be both foo and bar, and document2's should be bar, and loaded libraries in workspace should be foo and bar. However, removing any of the loaded libraries from workspace when either document is closed looks undesired.

If we store loaded libraries per document, it looks like we should change how guess_namespace() works to always creating a union of loaded libraries of all documents in workspace or iterating over all documents to guess namespace. Also, storing loaded libraries per document requires resolving packages per document like before.

@randy3k
Copy link
Member

randy3k commented Feb 10, 2020

I think that we could still cache the list of loaded libraries in workspace$loaded_packages. After a document is closed, the cache will be invalidated and needs to update again.

@renkun-ken
Copy link
Member Author

Do you mean we could avoid paying the cost of unnecessary resolving (which is quite expensive) while keep loaded libraries stored per document at the same time?

@randy3k
Copy link
Member

randy3k commented Feb 10, 2020

I think that's already been addressed in this PR.

What I meant is, each document should keep a record of this loaded libraries document$parse_data$loaded_packages

Then there is also workspace$loaded_packages which is a cache of packages computed from document$parse_data$loaded_packages.

When a document is closed, the cached will be invalided and we should recompute the cache from document$parse_data$loaded_packages

In this way, we will be able to remove unnecessary attached libraries.

@renkun-ken
Copy link
Member Author

renkun-ken commented Feb 11, 2020

Let's do an experiment for this. Suppose the following doc: load packages

doc1: A B
doc2: B
doc3: C

open doc1: doc1$loaded_packages=+A,+B, workspace$loaded_packages=A,B
open doc2: doc2$loaded_packages=A,B, workspace$loaded_packages=A,B
open doc3: doc3$loaded_packages=A,B,+C,workspace$loaded_packages=A,B,C
close doc3: recompute workspace$loaded_packages=A,B

This could work correctly. However, changing or saving documents may populate workspace$loaded_packages to all document$loaded_packages, making it impossible to remove unused packages.

open doc1: doc1$loaded_packages=+A,+B, workspace$loaded_packages=A,B
open doc2: doc2$loaded_packages=A,B, workspace$loaded_packages=A,B
open doc3: doc3$loaded_packages=A,B,+C,workspace$loaded_packages=A,B,C
change doc1: doc1$loaded_packages=A,B,C, workspace$loaded_packages=A,B,C
change doc2: doc2$loaded_packages=A,B,C, workspace$loaded_packages=A,B,C
close doc3: recompute workspace$loaded_packages=A,B,C

@randy3k
Copy link
Member

randy3k commented Feb 11, 2020

changing or saving documents may populate workspace$loaded_packages to all document$loaded_packages

I don't think I totally understand it. I imagine that document$loaded_packages could be only modified by the parse_callback. If there any changes in document$loaded_packages, we'll need to update the cache too.

@renkun-ken
Copy link
Member Author

renkun-ken commented Feb 11, 2020

I may misunderstand what you mean by document$loaded_libraries? Where does this come from? Does it come from resolving env$packages in parse_document regardless the packages workspace has loaded in workspace$loaded_packages?

@randy3k
Copy link
Member

randy3k commented Feb 11, 2020

document$loaded_packages should be a list of packages inside library() or require() calls. The result should be a superset of env$package of parse_document because it also includes the resolved dependencies.

@renkun-ken
Copy link
Member Author

Then the problem is that when we resolve env$packages it becomes more packages (e.g. tidyverse) since attaching some packages lead to attaching other packages.

Suppose there's library(A) in doc1 but A depends on B:

doc1: A -> B

Then doc1$loaded_packages should be A or A,B? In parse_document, env$packages is initially A, then resolve_attached_packages(env$packages) returns A,B. If it should be A,B, then resolving cannot be done in an incremental way, which has poor performance, and this PR is trying to solve this. But if it should be A, then workspace$load_packages is computed from document$loaded_packages and will not have B at all.

@randy3k
Copy link
Member

randy3k commented Feb 11, 2020

doc1$loaded_packages should be c("A", "B"). Why resolving cannot be done in an incremental way?

Initially, doc1$loaded_packages is empty, we pass it to parse_document and compare it to env$package which is "A" - run resolve_attached_packages on setdiff(env$package, doc1$loaded_package) and append it to env$package. Back to parse_callbeck, we set doc1$loaded_packages to env$package.

Am I missing something?

@renkun-ken
Copy link
Member Author

Yes, just got your point.

You mean we could do it incrementally per document. This PR has done it on workspace level.
Per document makes good sense. Let me refine the code.

@@ -51,7 +51,7 @@ constant_completion <- function(token) {
#' Complete a package name
#' @keywords internal
package_completion <- function(token) {
installed_packages <- rownames(utils::installed.packages())
installed_packages <- .packages(all.available = TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont even know this thing exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know either. .packages(all.available = TRUE) should be the fastest way to get all installed packages, find.package() should be the fastest way to check if a package exists. utils::installed.packages() reads all package metadata and thus has much more overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

(randyimac)-~$ Rscript -e "system.time(for(i in 1:100) .packages(all.available = TRUE))"
   user  system elapsed
  0.088   0.142   0.232
(randyimac)-~$ Rscript -e "system.time(for(i in 1:100) rownames(utils::installed.packages()))"
   user  system elapsed
  0.272   0.047   0.320

@randy3k
Copy link
Member

randy3k commented Feb 11, 2020

LGTM, except that you need to devtools::document() again.

@renkun-ken
Copy link
Member Author

Latest roxygen2 7.0+ will create documentation for R6 classes and produce a lot of warnings with the current docs. I reverted to roxygen 6.1.1 to redoc.

@renkun-ken renkun-ken merged commit 58a826b into REditorSupport:master Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary resolve on didOpen and didSave
2 participants