Skip to content
This repository has been archived by the owner on Oct 14, 2020. It is now read-only.

Design and Document the extraction process #4

Open
bmarwell opened this issue Jan 29, 2020 · 20 comments
Open

Design and Document the extraction process #4

bmarwell opened this issue Jan 29, 2020 · 20 comments
Assignees
Milestone

Comments

@bmarwell
Copy link
Owner

Use either a random temp dir, or directory from system path if given.

Delete on exit.

What happens if use tries to load library multiple times? Should we keep track of all loaded libraries?

@bmarwell bmarwell added this to the 1.0.0 milestone Jan 29, 2020
@bmarwell bmarwell self-assigned this Jan 29, 2020
@tresf
Copy link

tresf commented Jan 30, 2020

Use either a random temp dir, or directory from system path if given.

Some libraries are quite large to extract. In my opinion, persistence should be part of the design, perhaps offer an override option to use temp.

directory from system path if given.

A documented property sounds reasonable as well.

@bmarwell
Copy link
Owner Author

Absolutely. Default would be $TEMP/nativelib-loader(randomhash). The override ist documented on the front page in the readme already.

@tresf
Copy link

tresf commented Jan 31, 2020

I'm arguing against $TEMP being default. I think persistent should be default, $TEMP being an option.

@bmarwell
Copy link
Owner Author

I do not like applications which needs configuration and do not work out of the box. And other than tmp, no Parth is guaranteed to have write access. I do not think there is much choice involved.

In an application, you do not have knowledge over the system it runs on.

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 1, 2020

I almost forgot. Files are to be deleted on VM termination. This is why I think a random temp dir is a reasonable default.

@tresf
Copy link

tresf commented Feb 1, 2020

You're wasting IO, CPU and time doing it this way, with no justification. Keeping them around by default adds a small amount of storage+versioning to the benefit of everything else.

Temp by default is bad design. Cache always wins in a performance battle, let users kill that explicitly please.

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 1, 2020

Well, another possibility is to use something like
$HOME/.cache/libloader/<jarname>/, so they could be reused on the next start.

But it's waisting more space as we couldn't delete files there (might be in use by other processes), and the speed benefit is probably neglectable. Also, temp is more probable to reside on a SSD than home.

Or what do you mean with "cache"?

I would think, that's exactly what temp space is for. Maybe you haven't seen my previous comment, delete files on VM exit?

@tresf
Copy link

tresf commented Feb 1, 2020

Unless over provisioned, SSD are (in some designs, dangerously) volatile, faster IO isn't really an excuse to not persist. Furthermore, I see no evidence the $HOME statement is true. Environments are vastly different based on where the VM runs and there's simply too many types of systems to be able to make a broad statement like that.

Regardless, arguing against caching is a bad idea. Deleting on VM exit isn't caching, really. Not all VM implementations lifecycle the same, and -- as a real world example -- apps that access native hardware (such as the JSSC port we maintain) are most likely to live in end-user space, reloading the entire VM each use, seeing no benefit to this temp system.

@tresf
Copy link

tresf commented Feb 1, 2020

Well, another possibility is to use something like
$HOME/.cache/libloader//, so they could be reused on the next start.

This is the default behavior I'd expect. Location isn't as important as the reusability. Each system has it's own persistence models and various places that are "standard". Home is a nice Unix concept that's been reliable (albeit non-standard for Win/Mac) but I've never seen someone complain about it, so it's as good a place as any in my opinion.

@tresf
Copy link

tresf commented Feb 1, 2020

I'm not familiar with .cache, but I more commonly see .appname (e.g .libloader) in the wild (e.g. .gimp, .pidgin, etc).

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 1, 2020

A lot of apps pollute your home folder on Linux, because they do not obey standards.

https://freedesktop.org/wiki/Software/xdg-user-dirs/

And

https://wiki.archlinux.org/index.php/XDG_Base_Directory#User_directories

If the environment variable XDG_CACHE_HOME exists, that would be a good first choice, wouldn't it?

@tresf
Copy link

tresf commented Feb 1, 2020

Freedesktop is a slowly maturing Linux standard, and the Linux desktop is by far and large the most fragmented, ever.

Polluting is a bit harsh. Applying 2% of the Desktop's standards on 98% of systems is littering. 😉

Anyway, thanks for the link.

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 1, 2020

So what I read from here:

https://superuser.com/a/720848

Those files are really actually cache files. Maybe also add an option not just to change directories, but treat them as temp files instead. That way everyone would be happy. And in this case, I could live with .cache as a default.

Top level dot directories are usually configuration.

@tresf
Copy link

tresf commented Feb 1, 2020

I hadn't realized XDG addded a cache spec in 2010. That's obviously the right home for the Linux/Unixes. Mac and Windows have different places that are well documented, but I'm not aware of a dedicated cache directory like that.

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 1, 2020

I like to give ideas some thought, even (or especially) if I like them - like caching.
However, there are some serious potential issues I would not be willing to take, and if you sell a program to customers, you might not want to add well.

First of all, someone could run 32 and 64 bit jvms.
So wr might need to cope for that in the directory structure, e.g. $CACHEDIR/arch/libabc.so. no problem here, just having two copies (one for each architecture).

But a bigger problem are version updates and library versioning.
You might want to run the same tool in two versions after another or even at the same time. Unlikely with jssc, but not impossible. More probable with other libraries.
That means one java code would certainly load the wrong library or had to be configured by hand to use a temp dir instead, whereas for the user, it just stopped working.
I certainly do not like software which does not allow returning to older versions easily.

We might add the jar file where the so originates from to the path, but that is not enough either: there is no guarantee a jar file name contains the version number. The only thing which would be safe is a hash. But that would make a cache directory useless, as a hash requires scanning the class path first for that file and, well, reading everything in for a hash. That's even worse.

Iwas not even talking about versions of the library. Imagine you want to load("jssc", "5.0.0") to get libjssc.5.0.0.so on Linux. If it doesn't exist in the cache, you cannot just go for libjssc.so in the cache because the link might be broken. You'd have to scan the jars first anyway.

That said, what are other alternatives?

Best alternative of you ship your application: ship it with a start script which adds a path to -Djava.lib.dir. I think this moves should try the java native approach first anyway. For performance.

Other than that, I really do not see a lot of IO going on when using a temp dir.
First of all, looking up a file you know the name of in the classpath is cheap. You don't need to scan everything, java knows it's files.
Extracting is a one time process, this library will have a write-only table with libraries it already loaded.
Also, temp dirs are more likely to have tempfs (RAM FS) or an SSD. Not making assumptions here, but just talking about statistics.

So, what are your thoughts? If I overlooked anything let me know. I'd be really happy to add caching to this project. I also do not have issues to implement a "look up cache first" switch, which defaults to false (because of the potential issues I mentioned).

Please do not take this as offense. I just want to give it enough thought and think it through thoroughly to avoid unnecessary bugs. If I made a false assumption, just let me know. I'm always open to good arguments, which is why I tried to think this though in the first place.

Ben

@tresf
Copy link

tresf commented Feb 1, 2020

I feel that versioning is up to those implementing this. JSSC did this by hard-coding versioning into the library name. This seems completely reasonable, no? Perhaps if version is supplied, we persist, if it's not we temp.

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 2, 2020

This seems completely reasonable, no?

For caching, I'd call this almost mandatory.

Perhaps if version is supplied, we persist, if it's not we temp.

That sounds reasonable! I think we could make another ticket, and I could start implementing it as soon as there is a working release.

It might have a positive effect if you start your programme every few hours or so and it would extract at least a few hundreds MiB.

(Is still do not really get where a performance issue could occur, tbh. I still don't think there are many programs which need hundreds of MiB of shared objects. Even if, it's a one time penalty for every program start only. If this is an issue, that program could also ship the libraries pre-extracted and use the java native lib dir).

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 2, 2020

@tresf I updated the README. I thought I was going to implement this like the original libloader did it.

        /* will load 'natives/linux-x86_64-64/libjssc.so' on linux. */
        final LibLoaderResult loadLibrary = libLoader.loadLibrary("jssc");

        // or load a specific version 'natives/linux-x86_64-64/libjssc-5.0.0.so'.
        // if it doesn't exist, will load 'natives/linux-x86_64-64/libjssc.so'.
        final LibLoaderResult loadLibrary = libLoader.loadLibrary("jssc", "5.0.0");

@bmarwell
Copy link
Owner Author

bmarwell commented Feb 2, 2020

@tresf you got me. If you start a JVM frequently, it sure is waste of CPU and IO.

I am still figuring out how to solve this: https://github.com/java-native/libloader/wiki/Extraction-Process#caching-as-default

@tresf
Copy link

tresf commented Feb 4, 2020

FYI, I'm manually unsubscribing from this thread. Any further feedback will require a manual @tresf tag.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants