Design and Document the extraction process #4

bmarwell · 2020-01-29T21:46:25Z

Use either a random temp dir, or directory from system path if given.

Delete on exit.

What happens if use tries to load library multiple times? Should we keep track of all loaded libraries?

tresf · 2020-01-30T18:40:49Z

Use either a random temp dir, or directory from system path if given.

Some libraries are quite large to extract. In my opinion, persistence should be part of the design, perhaps offer an override option to use temp.

directory from system path if given.

A documented property sounds reasonable as well.

bmarwell · 2020-01-31T09:05:12Z

Absolutely. Default would be $TEMP/nativelib-loader(randomhash). The override ist documented on the front page in the readme already.

tresf · 2020-01-31T15:53:35Z

I'm arguing against $TEMP being default. I think persistent should be default, $TEMP being an option.

bmarwell · 2020-01-31T23:34:50Z

I do not like applications which needs configuration and do not work out of the box. And other than tmp, no Parth is guaranteed to have write access. I do not think there is much choice involved.

In an application, you do not have knowledge over the system it runs on.

bmarwell · 2020-02-01T07:45:41Z

I almost forgot. Files are to be deleted on VM termination. This is why I think a random temp dir is a reasonable default.

tresf · 2020-02-01T07:50:09Z

You're wasting IO, CPU and time doing it this way, with no justification. Keeping them around by default adds a small amount of storage+versioning to the benefit of everything else.

Temp by default is bad design. Cache always wins in a performance battle, let users kill that explicitly please.

bmarwell · 2020-02-01T08:05:16Z

Well, another possibility is to use something like
$HOME/.cache/libloader/<jarname>/, so they could be reused on the next start.

But it's waisting more space as we couldn't delete files there (might be in use by other processes), and the speed benefit is probably neglectable. Also, temp is more probable to reside on a SSD than home.

Or what do you mean with "cache"?

I would think, that's exactly what temp space is for. Maybe you haven't seen my previous comment, delete files on VM exit?

tresf · 2020-02-01T08:13:45Z

Unless over provisioned, SSD are (in some designs, dangerously) volatile, faster IO isn't really an excuse to not persist. Furthermore, I see no evidence the $HOME statement is true. Environments are vastly different based on where the VM runs and there's simply too many types of systems to be able to make a broad statement like that.

Regardless, arguing against caching is a bad idea. Deleting on VM exit isn't caching, really. Not all VM implementations lifecycle the same, and -- as a real world example -- apps that access native hardware (such as the JSSC port we maintain) are most likely to live in end-user space, reloading the entire VM each use, seeing no benefit to this temp system.

tresf · 2020-02-01T08:23:21Z

Well, another possibility is to use something like
$HOME/.cache/libloader//, so they could be reused on the next start.

This is the default behavior I'd expect. Location isn't as important as the reusability. Each system has it's own persistence models and various places that are "standard". Home is a nice Unix concept that's been reliable (albeit non-standard for Win/Mac) but I've never seen someone complain about it, so it's as good a place as any in my opinion.

tresf · 2020-02-01T08:25:44Z

I'm not familiar with .cache, but I more commonly see .appname (e.g .libloader) in the wild (e.g. .gimp, .pidgin, etc).

bmarwell · 2020-02-01T08:45:30Z

A lot of apps pollute your home folder on Linux, because they do not obey standards.

https://freedesktop.org/wiki/Software/xdg-user-dirs/

And

https://wiki.archlinux.org/index.php/XDG_Base_Directory#User_directories

If the environment variable XDG_CACHE_HOME exists, that would be a good first choice, wouldn't it?

tresf · 2020-02-01T08:48:08Z

Freedesktop is a slowly maturing Linux standard, and the Linux desktop is by far and large the most fragmented, ever.

Polluting is a bit harsh. Applying 2% of the Desktop's standards on 98% of systems is littering. 😉

Anyway, thanks for the link.

bmarwell · 2020-02-01T08:48:56Z

So what I read from here:

https://superuser.com/a/720848

Those files are really actually cache files. Maybe also add an option not just to change directories, but treat them as temp files instead. That way everyone would be happy. And in this case, I could live with .cache as a default.

Top level dot directories are usually configuration.

tresf · 2020-02-01T09:00:02Z

I hadn't realized XDG addded a cache spec in 2010. That's obviously the right home for the Linux/Unixes. Mac and Windows have different places that are well documented, but I'm not aware of a dedicated cache directory like that.

bmarwell · 2020-02-01T11:12:36Z

I like to give ideas some thought, even (or especially) if I like them - like caching.
However, there are some serious potential issues I would not be willing to take, and if you sell a program to customers, you might not want to add well.

First of all, someone could run 32 and 64 bit jvms.
So wr might need to cope for that in the directory structure, e.g. $CACHEDIR/arch/libabc.so. no problem here, just having two copies (one for each architecture).

But a bigger problem are version updates and library versioning.
You might want to run the same tool in two versions after another or even at the same time. Unlikely with jssc, but not impossible. More probable with other libraries.
That means one java code would certainly load the wrong library or had to be configured by hand to use a temp dir instead, whereas for the user, it just stopped working.
I certainly do not like software which does not allow returning to older versions easily.

We might add the jar file where the so originates from to the path, but that is not enough either: there is no guarantee a jar file name contains the version number. The only thing which would be safe is a hash. But that would make a cache directory useless, as a hash requires scanning the class path first for that file and, well, reading everything in for a hash. That's even worse.

Iwas not even talking about versions of the library. Imagine you want to load("jssc", "5.0.0") to get libjssc.5.0.0.so on Linux. If it doesn't exist in the cache, you cannot just go for libjssc.so in the cache because the link might be broken. You'd have to scan the jars first anyway.

That said, what are other alternatives?

Best alternative of you ship your application: ship it with a start script which adds a path to -Djava.lib.dir. I think this moves should try the java native approach first anyway. For performance.

Other than that, I really do not see a lot of IO going on when using a temp dir.
First of all, looking up a file you know the name of in the classpath is cheap. You don't need to scan everything, java knows it's files.
Extracting is a one time process, this library will have a write-only table with libraries it already loaded.
Also, temp dirs are more likely to have tempfs (RAM FS) or an SSD. Not making assumptions here, but just talking about statistics.

So, what are your thoughts? If I overlooked anything let me know. I'd be really happy to add caching to this project. I also do not have issues to implement a "look up cache first" switch, which defaults to false (because of the potential issues I mentioned).

Please do not take this as offense. I just want to give it enough thought and think it through thoroughly to avoid unnecessary bugs. If I made a false assumption, just let me know. I'm always open to good arguments, which is why I tried to think this though in the first place.

Ben

tresf · 2020-02-01T21:51:15Z

I feel that versioning is up to those implementing this. JSSC did this by hard-coding versioning into the library name. This seems completely reasonable, no? Perhaps if version is supplied, we persist, if it's not we temp.

bmarwell · 2020-02-02T01:39:39Z

This seems completely reasonable, no?

For caching, I'd call this almost mandatory.

Perhaps if version is supplied, we persist, if it's not we temp.

That sounds reasonable! I think we could make another ticket, and I could start implementing it as soon as there is a working release.

It might have a positive effect if you start your programme every few hours or so and it would extract at least a few hundreds MiB.

(Is still do not really get where a performance issue could occur, tbh. I still don't think there are many programs which need hundreds of MiB of shared objects. Even if, it's a one time penalty for every program start only. If this is an issue, that program could also ship the libraries pre-extracted and use the java native lib dir).

bmarwell · 2020-02-02T13:22:49Z

@tresf I updated the README. I thought I was going to implement this like the original libloader did it.

        /* will load 'natives/linux-x86_64-64/libjssc.so' on linux. */
        final LibLoaderResult loadLibrary = libLoader.loadLibrary("jssc");

        // or load a specific version 'natives/linux-x86_64-64/libjssc-5.0.0.so'.
        // if it doesn't exist, will load 'natives/linux-x86_64-64/libjssc.so'.
        final LibLoaderResult loadLibrary = libLoader.loadLibrary("jssc", "5.0.0");

bmarwell · 2020-02-02T15:55:05Z

@tresf you got me. If you start a JVM frequently, it sure is waste of CPU and IO.

I am still figuring out how to solve this: https://github.com/java-native/libloader/wiki/Extraction-Process#caching-as-default

tresf · 2020-02-04T18:51:54Z

FYI, I'm manually unsubscribing from this thread. Any further feedback will require a manual @tresf tag.

bmarwell added this to the 1.0.0 milestone Jan 29, 2020

bmarwell self-assigned this Jan 29, 2020

bmarwell mentioned this issue Feb 2, 2020

Implement caching #6

Open

tresf mentioned this issue Jul 27, 2021

Add directory layout according to os-maven-plugin and osdetector-gradle-plugin scijava/native-lib-loader#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and Document the extraction process #4

Design and Document the extraction process #4

bmarwell commented Jan 29, 2020

tresf commented Jan 30, 2020

bmarwell commented Jan 31, 2020

tresf commented Jan 31, 2020

bmarwell commented Jan 31, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020 •

edited

Loading

tresf commented Feb 1, 2020 •

edited

Loading

tresf commented Feb 1, 2020 •

edited

Loading

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 2, 2020 •

edited

Loading

bmarwell commented Feb 2, 2020

bmarwell commented Feb 2, 2020

tresf commented Feb 4, 2020

Design and Document the extraction process #4

Design and Document the extraction process #4

Comments

bmarwell commented Jan 29, 2020

tresf commented Jan 30, 2020

bmarwell commented Jan 31, 2020

tresf commented Jan 31, 2020

bmarwell commented Jan 31, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020 • edited Loading

tresf commented Feb 1, 2020 • edited Loading

tresf commented Feb 1, 2020 • edited Loading

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 1, 2020

tresf commented Feb 1, 2020

bmarwell commented Feb 2, 2020 • edited Loading

bmarwell commented Feb 2, 2020

bmarwell commented Feb 2, 2020

tresf commented Feb 4, 2020

tresf commented Feb 1, 2020 •

edited

Loading

tresf commented Feb 1, 2020 •

edited

Loading

tresf commented Feb 1, 2020 •

edited

Loading

bmarwell commented Feb 2, 2020 •

edited

Loading