Incremental resolving packages #204

renkun-ken · 2020-02-10T09:30:47Z

Closes #203

This PR implements incremental package resolution so that resolving are done if and only if there are new packages appearing in the document that are not loaded in workspace.

This reduces all unnecessary package resolving on didOpen and didSave, significantly reduces the time of parse task in most cases, and also allows live loading packages on document didChange without having to didSave (e.g. editing a temporary untitled document not intended to save on disk)

randy3k · 2020-02-10T18:19:15Z

Do we actually want to keep the information of loaded libraries in document$parse_data? In this way, when a document is closed, we could remove those loaded libraries from the workspace via a new function workspace$close_document.

renkun-ken · 2020-02-10T22:52:40Z

Suppose we opened two documents using the following libraries:

Document 1:

library(foo) # which causes library(bar)

Document 2:

library(bar)

Then document1's loaded libraries should be both foo and bar, and document2's should be bar, and loaded libraries in workspace should be foo and bar. However, removing any of the loaded libraries from workspace when either document is closed looks undesired.

If we store loaded libraries per document, it looks like we should change how guess_namespace() works to always creating a union of loaded libraries of all documents in workspace or iterating over all documents to guess namespace. Also, storing loaded libraries per document requires resolving packages per document like before.

randy3k · 2020-02-10T22:59:27Z

I think that we could still cache the list of loaded libraries in workspace$loaded_packages. After a document is closed, the cache will be invalidated and needs to update again.

renkun-ken · 2020-02-10T23:03:53Z

Do you mean we could avoid paying the cost of unnecessary resolving (which is quite expensive) while keep loaded libraries stored per document at the same time?

randy3k · 2020-02-10T23:10:38Z

I think that's already been addressed in this PR.

What I meant is, each document should keep a record of this loaded libraries document$parse_data$loaded_packages

Then there is also workspace$loaded_packages which is a cache of packages computed from document$parse_data$loaded_packages.

When a document is closed, the cached will be invalided and we should recompute the cache from document$parse_data$loaded_packages

In this way, we will be able to remove unnecessary attached libraries.

renkun-ken · 2020-02-11T02:23:29Z

Let's do an experiment for this. Suppose the following doc: load packages

doc1: A B
doc2: B
doc3: C

open doc1: doc1$loaded_packages=+A,+B, workspace$loaded_packages=A,B
open doc2: doc2$loaded_packages=A,B, workspace$loaded_packages=A,B
open doc3: doc3$loaded_packages=A,B,+C,workspace$loaded_packages=A,B,C
close doc3: recompute workspace$loaded_packages=A,B

This could work correctly. However, changing or saving documents may populate workspace$loaded_packages to all document$loaded_packages, making it impossible to remove unused packages.

open doc1: doc1$loaded_packages=+A,+B, workspace$loaded_packages=A,B
open doc2: doc2$loaded_packages=A,B, workspace$loaded_packages=A,B
open doc3: doc3$loaded_packages=A,B,+C,workspace$loaded_packages=A,B,C
change doc1: doc1$loaded_packages=A,B,C, workspace$loaded_packages=A,B,C
change doc2: doc2$loaded_packages=A,B,C, workspace$loaded_packages=A,B,C
close doc3: recompute workspace$loaded_packages=A,B,C

randy3k · 2020-02-11T02:30:11Z

changing or saving documents may populate workspace$loaded_packages to all document$loaded_packages

I don't think I totally understand it. I imagine that document$loaded_packages could be only modified by the parse_callback. If there any changes in document$loaded_packages, we'll need to update the cache too.

renkun-ken · 2020-02-11T02:40:31Z

I may misunderstand what you mean by document$loaded_libraries? Where does this come from? Does it come from resolving env$packages in parse_document regardless the packages workspace has loaded in workspace$loaded_packages?

randy3k · 2020-02-11T02:49:29Z

document$loaded_packages should be a list of packages inside library() or require() calls. The result should be a superset of env$package of parse_document because it also includes the resolved dependencies.

renkun-ken · 2020-02-11T03:02:20Z

Then the problem is that when we resolve env$packages it becomes more packages (e.g. tidyverse) since attaching some packages lead to attaching other packages.

Suppose there's library(A) in doc1 but A depends on B:

doc1: A -> B

Then doc1$loaded_packages should be A or A,B? In parse_document, env$packages is initially A, then resolve_attached_packages(env$packages) returns A,B. If it should be A,B, then resolving cannot be done in an incremental way, which has poor performance, and this PR is trying to solve this. But if it should be A, then workspace$load_packages is computed from document$loaded_packages and will not have B at all.

randy3k · 2020-02-11T03:08:15Z

doc1$loaded_packages should be c("A", "B"). Why resolving cannot be done in an incremental way?

Initially, doc1$loaded_packages is empty, we pass it to parse_document and compare it to env$package which is "A" - run resolve_attached_packages on setdiff(env$package, doc1$loaded_package) and append it to env$package. Back to parse_callbeck, we set doc1$loaded_packages to env$package.

Am I missing something?

renkun-ken · 2020-02-11T03:11:14Z

Yes, just got your point.

You mean we could do it incrementally per document. This PR has done it on workspace level.
Per document makes good sense. Let me refine the code.

randy3k · 2020-02-11T06:26:26Z

R/completion.R

@@ -51,7 +51,7 @@ constant_completion <- function(token) {
 #' Complete a package name
 #' @keywords internal
 package_completion <- function(token) {
-    installed_packages <- rownames(utils::installed.packages())
+    installed_packages <- .packages(all.available = TRUE)


dont even know this thing exists.

Didn't know either. .packages(all.available = TRUE) should be the fastest way to get all installed packages, find.package() should be the fastest way to check if a package exists. utils::installed.packages() reads all package metadata and thus has much more overhead.

You are right

(randyimac)-~$ Rscript -e "system.time(for(i in 1:100) .packages(all.available = TRUE))" user system elapsed 0.088 0.142 0.232 (randyimac)-~$ Rscript -e "system.time(for(i in 1:100) rownames(utils::installed.packages()))" user system elapsed 0.272 0.047 0.320

randy3k · 2020-02-11T19:52:27Z

LGTM, except that you need to devtools::document() again.

renkun-ken · 2020-02-12T00:04:03Z

Latest roxygen2 7.0+ will create documentation for R6 classes and produce a lot of warnings with the current docs. I reverted to roxygen 6.1.1 to redoc.

renkun-ken added 2 commits February 10, 2020 17:24

Incremental resolving packages

eb50ff0

Add timestamp to logging to better understand performance

61737ac

renkun-ken added 3 commits February 11, 2020 11:46

Incremental resolving packages per document

02d1a81

Update loaded packages on didClose

53ee5ba

Exclude startup packages in resolve_attached_packages

face61f

randy3k reviewed Feb 11, 2020

View reviewed changes

Redocument

71653cd

renkun-ken merged commit 58a826b into REditorSupport:master Feb 12, 2020

This was referenced Feb 12, 2020

Separate resolve task from parse task #205

Merged

Completion only works for packages loaded by default #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental resolving packages #204

Incremental resolving packages #204

renkun-ken commented Feb 10, 2020 •

edited

Loading

randy3k commented Feb 10, 2020

renkun-ken commented Feb 10, 2020

randy3k commented Feb 10, 2020

renkun-ken commented Feb 10, 2020

randy3k commented Feb 10, 2020

renkun-ken commented Feb 11, 2020 •

edited

Loading

randy3k commented Feb 11, 2020 •

edited

Loading

renkun-ken commented Feb 11, 2020 •

edited

Loading

randy3k commented Feb 11, 2020 •

edited

Loading

renkun-ken commented Feb 11, 2020

randy3k commented Feb 11, 2020 •

edited

Loading

renkun-ken commented Feb 11, 2020

randy3k Feb 11, 2020

renkun-ken Feb 11, 2020

randy3k Feb 11, 2020

randy3k commented Feb 11, 2020

renkun-ken commented Feb 12, 2020

Incremental resolving packages #204

Incremental resolving packages #204

Conversation

renkun-ken commented Feb 10, 2020 • edited Loading

randy3k commented Feb 10, 2020

renkun-ken commented Feb 10, 2020

randy3k commented Feb 10, 2020

renkun-ken commented Feb 10, 2020

randy3k commented Feb 10, 2020

renkun-ken commented Feb 11, 2020 • edited Loading

randy3k commented Feb 11, 2020 • edited Loading

renkun-ken commented Feb 11, 2020 • edited Loading

randy3k commented Feb 11, 2020 • edited Loading

renkun-ken commented Feb 11, 2020

randy3k commented Feb 11, 2020 • edited Loading

renkun-ken commented Feb 11, 2020

randy3k Feb 11, 2020

Choose a reason for hiding this comment

renkun-ken Feb 11, 2020

Choose a reason for hiding this comment

randy3k Feb 11, 2020

Choose a reason for hiding this comment

randy3k commented Feb 11, 2020

renkun-ken commented Feb 12, 2020

renkun-ken commented Feb 10, 2020 •

edited

Loading

renkun-ken commented Feb 11, 2020 •

edited

Loading

randy3k commented Feb 11, 2020 •

edited

Loading

renkun-ken commented Feb 11, 2020 •

edited

Loading

randy3k commented Feb 11, 2020 •

edited

Loading

randy3k commented Feb 11, 2020 •

edited

Loading