-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neither __SOURCE_DIRECTORY__ nor Paket.Packages are unique for each notebook in a docker container #112
Comments
With Paket, if you have two notebooks with different versions it's going to cause churn but they seem to have coexisted. We discussed moving to a model with a unique packages folder per notebook but that may require quite a lot of duplication of binaries. I'm usually of the opinion that disc space is cheap but plotting/FsLab is a lot for every experimental notebook. |
Is this causing an issue when you switch between two notebooks? |
It causes a problem when using It would also cause a problem with |
@cgravill On Azure notebooks I find myself with 5-6 similar duplicated notebooks all with slight variations on package lists Maybe paket uses a package cache (for downloads) which is independent of the "packages" directory. So that could be shared, but the individual notebooks gets isolated "packages" directories. But yes, the cost could certainly be high in disk space. It's an issue for any F# scripting model that acquires packages, to be honest - how isolated should they be? |
To put some figures on the packages directories: nuget FsLab = 1.0.2 framework: net451 framework: net462 nuget XPlot.Plotly = 1.4.2 framework: net451 framework: net462 On Windows 10, Paket version 3.31.2 (can provide .lock file needed) |
My ideal is a machine-wise immutable store of downloaded packages. Perhaps we should use: https://fsprojects.github.io/Paket/nuget-dependencies.html#Putting-the-version-no-into-the-path though it'll cause annoyance around knowing the version you've got in a given notebook. Perhaps this combined with generating referencing scripts. This still doesn't solve conflicting changes to the dependency file however. |
Each notebook needs a logically separate paket.dependencies. This could be represented as a group or as a separate dependencies file. This is true whether or not version numbers go on the directories, to prevent version resolution from churning/conflicting, and to allow reference scripts to work. As a result, I'm afraid putting version numbers on doesn't help at all. Unfortunately, when using groups or separate paket.dependencies, the packages are duplicated The core issue I think is whether a user's notebooks represent a single project or a collection of independent projects. When they represent a single project, then having a single paket.dependencies, and a single reference script, is not problematic: the notebooks logically share dependencies, and this saves (significantly) on disk space. @cgravill has users whose use case is like this, where their collection of notebooks (possibly many dozens of them) share on the order of 150 - 200 MB of packages (based on the number above). One possible solution is to use the notebook's directory, so that notebooks in a single directory share a paket.dependencies, but other directories or sub-directories do not. This would provide an "organising principle", but I'm afraid it is too subtle, and would leave users confused on both sides: some wondering why notebooks in the same directory have conflicting packages and others wondering why their directory tree of notebooks don't share a paket.dependencies. |
As a side note: in an IPython notebook, dependencies are installed using pip, and they are installed globally. In other words, effectively the same as the system that's currently in place for IFSharp. That doesn't mean it's a good system, it just means I was hoping IPython had a better approach but it doesn't appear to :) |
Yes a logically separate expression of dependencies seems the right approach. The I haven't tested that but I think it would let us get stable dependencies per notebook. However, it would lead to churn on the packages e.g. the packages/Newtonsoft.Json directory would constantly switch between versions causing IO. What I had in mind with adding the version number is that this could be prevented. However, we might run into issues with Paket cleaning up references: https://fsprojects.github.io/Paket/reference/paket-garbagecollection.html If we can resolve them and combined with the auto-generation of referencing scripts from #121 it would shield the the notebook from the noisy changes happening underneath. I haven't tried any of the above properly yet so it may run into issues but hope it's helpful. If it works it would give one copy of referenced dlls, minimise IO, and give consistent stable dependencies. |
The newer storage:none mechanism of Paket would be a nice way avoid the IO noise while keeping everything safe: https://fsprojects.github.io/Paket/dependencies-file.html#Disable-packages-folder and pretty much the ideal I hoped for! While it's still marked as beta I've used it elsewhere particularly on netcore projects. |
I've done some experiments with the storage:none and it works well. There's a snag in that it makes any native dependencies awkward to load, which are often used in a notebook scenario. There's also a planned extension to #r which would make this much better: |
@cgravill I stumbled over this issue. If you mean with "native dependencies" the unmanaged stuff, then there is also an API in paket to do this just fine. You can take a look at this commit where I added support in FAKE for it. |
Yes, unmanaged platform specific libraries. That's very interesting @matthid and your corresponding Paket change fsprojects/Paket#3593 is great, and would mean loading dependencies would be much easier even without storage:none. One of the routes we use for loading dependencies is via the general purpose load scripts |
As mentioned in #106,
__SOURCE_DIRECTORY__
is the same/home/nbuser
for each notebook process in a docker container. Likewise theSystem.Environment.CurrentDirectory
for each notebook is the same.Also, the directory used for nuget packages is not unique. This would mean that different notebooks may get different nuget package versions, and may alter the paket.dependencies in conflicting ways.
Both can easily lead to conflicting use of the file system from different notebooks if the current directory is used to store and resolve nuget packages, for example, depending on the technique used to get nuget packages.
The text was updated successfully, but these errors were encountered: