Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Julia support to gprc4bmi #144

Merged
merged 31 commits into from
Oct 30, 2023
Merged

Add Julia support to gprc4bmi #144

merged 31 commits into from
Oct 30, 2023

Conversation

sverhoeven
Copy link
Member

Refs #101

test/test_julia.py Outdated Show resolved Hide resolved
Args:
package: Name of package to install.
"""
jl.Pkg.add(package)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific URLs and git branches can also be installed. See https://pkgdocs.julialang.org/v1/api/#Package-API-Reference

import Pkg
Pkg.add(name="Wflow", rev="BMI")

or

Pkg.add(url="https://github.com/JuliaLang/Example.jl", rev="master")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed install method and use jl.Pkg.add() direct;ly.

@sverhoeven
Copy link
Member Author

Calling a bmi function in julia via a grpc python server from a grpc client causes the process to use 100% cpu and not return a response.

Seems grpc and juliacall can not live together.

Need to add handle_julia_exception from https://blog.esciencecenter.nl/10-examples-of-embedding-julia-in-c-c-66282477e62c
to get better error

Got error:

1.4142135623730951
[7] signal (11.1): Segmentation fault
in expression starting at none:0
jl_lookup_generic_ at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2819 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2936
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
ijl_call1 at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:230
main at /usr/local/bin/run_bmi_server (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /usr/local/bin/run_bmi_server (unknown line)
Allocations: 2060272 (Pool: 2059240; Big: 1032); GC: 3
Segmentation fault
@sverhoeven
Copy link
Member Author

Now trying to write c wrapper around julia in https://github.com/eWaterCycle/grpc4bmi/tree/julia/test/heat-images/c-julia

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sverhoeven
Copy link
Member Author

Got it to work until it segfaults somewhere as seen in https://github.com/eWaterCycle/grpc4bmi/blob/julia/test/heat-images/c-julia/demo.ipynb. With following server output:

docker build -t heat:c-julia -f test/heat-images/c-julia/Dockerfile .
docker run -ti --rm -p 55555:55555 heat:c-julia
BMI grpc server attached to server address 0.0.0.0:55555
GetComponentName1
Heat.Model at time = 0.0 and mean temperature = 0.43864346382398506
BMI.get_component_name(model)
GetComponentName1

[1] signal (11.1): Segmentation fault
in expression starting at none:0
Allocations: 2058140 (Pool: 2057109; Big: 1031); GC: 3

I think this has to do with grpc creating threads for each incoming request. If BMI.initialize was not called in current thread then segfault happens.

I tried to limit max threads to 2, but segfaults still happen, but less frequently.

https://scientificcoder.com/extreme-multi-threading-c-and-julia-19-integration looks like what I need to use.

I do not know how to make it run without segfaulting, maybe @abelsiqueira or @goord know a way?

@abelsiqueira
Copy link
Contributor

I can reproduce the error :)

I tried some things for a couple of hours, but I've made no progress. Using mtx.lock() and unlock is not enough. Using jl_adopt_thread() creates a different segmentation fault at initialization:

[1] signal (6.-6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_init_threadtls at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/threading.c:333
ijl_adopt_thread at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/threading.c:413
_ZN8BmiJulia10InitializeENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at /usr/local/bin/run_bmi_server (unknown line)
...

which means nothing to me.

The other functions jl_gc_safe_enter and jl_enter_threaded_region were not found by the compiler. Haven't checked why.

Then I tried PackageCompiler, but you need to define the functions as ccallable, and that requires changing some code from bmi-example or something, and I don't have experience doing that. The most basic test failed because the type Model was not concrete (although I think it is).

Overall, I don't have enough knowledge of the project, nor of C++, nor of Julia+C++ integration, nor threads, so at this point I am just randomly testing things.

Let me know if you need a longer consultation to try to make this work, and sorry for not being more helpful.

@sverhoeven
Copy link
Member Author

I can reproduce the error :)

I tried some things for a couple of hours, but I've made no progress. Using mtx.lock() and unlock is not enough. Using jl_adopt_thread() creates a different segmentation fault at initialization:

[1] signal (6.-6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_init_threadtls at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/threading.c:333
ijl_adopt_thread at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/threading.c:413
_ZN8BmiJulia10InitializeENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at /usr/local/bin/run_bmi_server (unknown line)
...

which means nothing to me.

The other functions jl_gc_safe_enter and jl_enter_threaded_region were not found by the compiler. Haven't checked why.

Then I tried PackageCompiler, but you need to define the functions as ccallable, and that requires changing some code from bmi-example or something, and I don't have experience doing that. The most basic test failed because the type Model was not concrete (although I think it is).

Overall, I don't have enough knowledge of the project, nor of C++, nor of Julia+C++ integration, nor threads, so at this point I am just randomly testing things.

Let me know if you need a longer consultation to try to make this work, and sorry for not being more helpful.

Thanks for looking into this. We will go for wrapping the Julia model directly in Python with juliacall without grpc in-between the Python client and model.

@BSchilperoort you already have this working in https://github.com/eWaterCycle/pywflow, but it would be nice if BmiJulia class could be used there.

@BSchilperoort
Copy link
Member

I modified bmi_julia_model.py, as the numpy arrays have to be converted before passing to Julia.

@sverhoeven
Copy link
Member Author

Hmm test gets stuck in test_subproc

@sverhoeven sverhoeven marked this pull request as ready for review October 26, 2023 12:22
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

Copy link
Member

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame that Julia doesn't play nice with gRPC 😞

The local model works well. I am able to use it for Wflow.jl, and haven't run into any issues anymore (except the init thing, see the comment).

jl.seval("import " + package4implementation)
model = jl.seval(model_name)
implementation = jl.seval(implementation_name)
return BmiJulia(model, implementation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this: you cannot add this to an eWaterCycle LocalModel. That requires a callable class, i.e. one that is not initialized yet.

For Wflow.jl I had to do the following to make it work with eWaterCycle:

class WflowBmi(BmiJulia):
    def __init__(self):
        m = self.from_name("Wflow.Model", implementation_name="Wflow.BMI")
        super().__init__(m.model, m.implementation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it made sense for grpc server from cli, but not as a super class.

Created #145

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as it does not work, do we want to leave these files here? Do we want to attempt this again later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep it because it was a lot of work, and its already located in a nook in the repo.

Copy link
Member Author

@sverhoeven sverhoeven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at it.

jl.seval("import " + package4implementation)
model = jl.seval(model_name)
implementation = jl.seval(implementation_name)
return BmiJulia(model, implementation)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it made sense for grpc server from cli, but not as a super class.

Created #145

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep it because it was a lot of work, and its already located in a nook in the repo.

@sverhoeven sverhoeven merged commit 37040d6 into main Oct 30, 2023
3 checks passed
@sverhoeven sverhoeven deleted the julia branch October 30, 2023 10:05
@sverhoeven sverhoeven mentioned this pull request Nov 13, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants