-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenPMD interface fails after LAMMPS usage #476
Comments
Sounds interesting, I'll have a look |
Hey Lenz @RandomDefaultUser, this file is not part of the default sample data, right? If so, can you share it with me somehow, please? # Trigger LAMMPS by performing inference on an atomic snapshot.
parameters, network, data_handler, predictor = mala.Predictor.\
load_run("be_model", path="basic") |
I replaced that line with something else from the test that looked similar, resulting in: #!/usr/bin/env python
import os
from ase.io import read
import mala
from mala.datahandling.data_repo import data_repo_path
data_path = os.path.join(data_repo_path, "Be2")
# Trigger LAMMPS by performing inference on an atomic snapshot.
parameters, network, data_handler, predictor = mala.Predictor.load_run(
"workflow_test", path=os.path.join(data_repo_path, "workflow_test")
)
parameters.targets.target_type = "LDOS"
parameters.targets.ldos_gridsize = 11
parameters.targets.ldos_gridspacing_ev = 2.5
parameters.targets.ldos_gridoffset_ev = -5
parameters.running.inference_data_grid = [18, 18, 27]
parameters.descriptors.descriptor_type = "Bispectrum"
parameters.descriptors.bispectrum_twojmax = 10
parameters.descriptors.bispectrum_cutoff = 4.67637
parameters.targets.pseudopotential_path = data_path
predicted_ldos = predictor. \
predict_from_qeout(os.path.join(data_path,
"Be_snapshot3.out"))
ldos_calculator: mala.LDOS
ldos_calculator = data_handler.target_calculator
ldos_calculator. \
read_additional_calculation_data(os.path.join(data_path,
"Be_snapshot3.out"),
"espresso-out")
ldos_calculator.read_from_array(predicted_ldos)
# total_energy_traditional = ldos_calculator.total_energy
# parameters.descriptors.use_atomic_density_energy_formula = True
ldos_calculator.read_from_array(predicted_ldos)
# Test OpenPMD.
params = mala.Parameters()
ldos_calculator = mala.LDOS.from_numpy_file(
params, os.path.join(data_path, "Be_snapshot1.out.npy")
)
ldos_calculator.read_additional_calculation_data(
os.path.join(data_path, "Be_snapshot1.out"), "espresso-out"
)
# Write and then read in via OpenPMD and make sure all the info is
# retained.
ldos_calculator.write_to_openpmd_file(
"test_openpmd.h5", ldos_calculator.local_density_of_states
) This runs without problems for me. A bug like this might depend on the specific setup that you are using, can you please tell me:
|
What's a bit weird: According to your backtrace, the error occurs very early during construction of the
|
Some weirdness seems to be going on in the linker, for some reason the openPMD shared library resolves C++ STL symbols in the Lammps shared library. When compiling a GPU-aware Lammps, this is likely to lead to ABI incompatibilities.
I honestly have no idea how this even happens. For now, a workaround is just adding |
Thanks for the investigation! I am glad that the error is reproducible, that helps a lot. At least now we know where to look... |
Hi, is it possible that some components (lammps, openPMD-api) are not built with the same compilers / stdlibs? I see that lammps.so was built with nix while openPMD-api came from which source? I suspect that something in the lammps build exposes or overwrites symbols of the stdlib or some other incompatibility in build toolchains is going on. |
Thank you for looking at this, Axel! I built both openPMD and Lammps with Nix and their dependencies should be compatible. The dynamically linked dependencies are:
Here is the diff of both shared objects' dependencies:
However, Lammps is built with nvcc+gcc11.3.0 while openPMD is directly built with gcc11.3.0. What seems weird is that Lammps does not link to libstdc++.so.6 at all, but somehow still carries its symbols.
So we should probably ask the Lammps developers if their code does anything that could be causing this? |
I have tried looking into this once more, and I think that I have found out what caused the issue on my end. Since the symptoms on your end seem to be the same, it's likely that we're looking at the same thing here. In the failing build environment, I had built Lammps with NVCC, but my Kokkos build was with Clang (I had had issues with a gcc build and picked Clang as an alternative). So, openPMD-api and Lammps were referring to two different C++ standard libraries that are ABI-incompatible, but use the same symbols. Since one symbol cannot exist twice in the same application context, whoever loads his symbols first, gets the first shot. Hence the error being suppressible by adding an early I tested my environment from back then again and can still reproduce the issue. TLDR: This is likely not a bug, but rather a wrong software environment with incompatible dependencies. Do you still know how you had set up your environment for this bug to occur? @RandomDefaultUser Would also be interesting to see if adding a |
When investigating a problem with the test pipeline, I stumbled upon the fact that attempting an OpenPMD write after LAMMPS has been used in any capacity will result in crash. A MWE to reproduce this problem (assuming the model from the basic examples is present) is:
This results in
For good measure one may through in a
mala.finalize()
before the OpenPMD part, which calls thelammps.finalize()
function - but this does not affect the error in any way.The text was updated successfully, but these errors were encountered: