Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install into a single global directory? #123

Closed
stevengj opened this issue Sep 20, 2018 · 12 comments
Closed

install into a single global directory? #123

stevengj opened this issue Sep 20, 2018 · 12 comments

Comments

@stevengj
Copy link
Member

stevengj commented Sep 20, 2018

Currently we install into the deps directory. However, in Julia 1.0, packages are no longer updated in place, this means that every time you update Conda.jl it needs to re-install anaconda, which is wasteful and breaks things like PyCall that expect libpython to stay in one place.

My thinking is that we should put the root environment into ~/.julia/$JLENV/conda$(CONDA_JL_VERSION) instead, where JLENV is the current Julia package environment. This way, we will only have a single Anaconda installation (per Python major version) that persists across updates.

However, I must admit that I don't fully grok how Pkg environments work. From reading the docs, I guess there is a stack of environments. @StefanKarpinski, is there a way to tell which environment Conda.jl was installed into? Am I thinking about this in the right way?

@tkf
Copy link
Member

tkf commented Sep 21, 2018

Isn't it wasteful to have the base conda environment for each Julia environment? I'd suggest:

  • ~/.julia/data/Conda/base: base conda installation used from all versions of Julia and Conda.jl
  • ~/.julia/data/Conda/envs/v$X.$Y: default conda environment to be used for Julia version $X.$Y.
  • ~/.julia/data/Conda/envs/$NAMED_ENV: conda environment to be used for named environment $NAMED_ENV (i.e., the idea is to map ~/.julia/environments/$DIRECTORY to ~/.julia/data/Conda/envs/$DIRECTORY, including ~/.julia/environments/v$X.$Y).
  • $PROJECT_PATH/.condajl (where $PROJECT_PATH is the directory s.t. $PROJECT_PATH/Project.toml exists): conda environment to be used for the project/environment at $PROJECT_PATH. Ideally this should be optional since creating a conda environment is much more expensive than how Pkg3 creates a Julia environment. But I think it requires package options Package options JuliaLang/Juleps#38

Notes: The reason why above scheme (i.e., sharing ~/.julia/data/Conda/base across all Julia versions and environments) is useful is that conda has some basic de-duplication mechanisms. For example, it tries to use hard-link in the conda environments when it can. The conda installation directory (~/.julia/data/Conda/base above; usually ~/miniconda3 etc.) has some global cache such as pkgs containing downloaded package archives and decompressed directories of them. Using such mechanism to save disk space requires to share the base installation.

Some discussion points:

  • In the above suggestion I am assuming that ~/.julia/data/$PACKAGE is the location that is dedicated to $PACKAGE. However, this does not fit well with Pkg3's model where a package is identified by a UUID. So maybe it should be something like ~/.julia/data/8f4d0f93-b110-5947-807f-2305c1781a2d?
  • Maybe use the standard location ~/miniconda instead of ~/.julia/data/Conda/base?
  • Is data the right name? opt? share?

Edit: swap "data" and "share"
Edit2: Add "Notes"

@stevengj
Copy link
Member Author

I agree it's fine for different Pkg environments to correspond to different conda virtualenvs.

share in Unix refers to a directory for architecture-independent data. I think ~/.julia/conda{2,3} is fine here and will be convenient — the Conda.jl package is kind of in a special situation because it would be unwise for other packages to install whole Anaconda distros.

I don't think we should use ~/miniconda — the whole point of Conda.jl is to install a Julia-specific Anaconda distro so that it doesn't get messed up by whatever other stuff the user might have installed.

@tkf
Copy link
Member

tkf commented Sep 24, 2018

I thought it'd be nice to have a scheme for package-specific data directory. For example, dataset library (like RDatasets.jl) could use such location across all environments. But this is more like Pkg.jl enhancement idea.

I understand that that the aim of Conda.jl is to isolate it from user's ~/{mini,ana}conda. My point was that you only need to have a single conda installation for Julia-specific usage. Everything else can be conda's (virtual) environment. Those environments are isolated from the base environment (in principle, unless conda have some critical bugs). You can even create Python 2 conda environment with Miniconda3. Furthermore, you can use ~/miniconda to "bootstrap" Conda.jl's main environment by installing conda package in it. This way, you don't need to touch ~/miniconda base environment. But this last point was probably too eager and was not the main point.

@stevengj
Copy link
Member Author

I'm just worried about user-maintained ~/miniconda base environments being broken in some way — I've seen too many bit-rotted Python installations to trust something we find in a non-Julia directory.

@stevengj
Copy link
Member Author

(See https://github.com/oxinabox/DataDeps.jl for other kinds of data.)

@tkf
Copy link
Member

tkf commented Sep 24, 2018

I'm just worried about user-maintained ~/miniconda base environments being broken in some way

Sure, I understand the worry. How about reusing the same miniconda installation for all Julia versions and all Julia environments? (They of course can have different environments.)

DataDeps.jl

It looks like they have a similar discussion too: oxinabox/DataDeps.jl#48

@stevengj
Copy link
Member Author

The lack of persistent package options (JuliaLang/Juleps#38) is a problem here too, because we currently have no way of "remembering" whether the user selected Python 2 or Python 3 or some custom environment.

It's pretty urgent that we get some fix here, even if it is suboptimal. Upgrading Conda.jl currently takes forever, breaks PyCall (because the libpython path changes), and wastes gigabytes of space.

@tkf
Copy link
Member

tkf commented Oct 11, 2018

Why not use ~/.julia/data/Conda/envs/v$(VERSION.major).$(VERSION.minor)? This would be forward-compatible to what I suggested in JuliaLang/Pkg.jl#777 (comment). Or maybe even ~/.julia/environments/v$(VERSION.major).$(VERSION.minor)/condajl?

Isn't it simple to fix once the location is decided?

@StefanKarpinski
Copy link

The Julia version doesn’t seem necessary or sufficient for isolating Conda setups.

@tkf
Copy link
Member

tkf commented Oct 11, 2018

I agree. But I don't think it's reasonable to install miniconda for each Julia environment since it's much more space consuming. That's why I suggested a hybrid approach based on private_env package option. JuliaLang/Pkg.jl#777 (comment) (added: read #123 (comment) first)

Furthermore, to obtain some degree of de-duplication so that it becomes reasonable to use separate conda environment for some Julia environments, we need JuliaLang/Pkg.jl#777

@stevengj
Copy link
Member Author

stevengj commented Oct 11, 2018

Using the Julia version will also lead to too many Conda installations; there is no strong reason not to share conda installations between Julia 1.0 and Julia 1.1, for example, or for that matter with Julia 0.7.

In the short run, I'm starting to feel like it will be better to just install conda in ~/.julia/conda (shared between all Julia versions and environments) and worry later about adding the option to let a Julia project/environment install its own conda virtualenv with a given set of packages. The latter seems like it depends on Pkg option support anyway.

@StefanKarpinski
Copy link

Yes, that seems pretty reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants