Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyFluent: Problem while submitting jobs to Windows job scheduler. #959

Closed
2 tasks done
ypatel-qa opened this issue Sep 30, 2022 · 4 comments · Fixed by #966
Closed
2 tasks done

PyFluent: Problem while submitting jobs to Windows job scheduler. #959

ypatel-qa opened this issue Sep 30, 2022 · 4 comments · Fixed by #966
Assignees
Labels
bug Issue, problem or error in PyMAPDL

Comments

@ypatel-qa
Copy link
Contributor

🔍 Before submitting the issue

  • I have searched among the existing issues
  • I am using a Python virtual environment

🐞 Description of the bug

I am unable to submit PyFuent jobs to the Windows job scheduler.

I am trying three scenarios using the HPC Pack 2019 Job Manager:

(1) Run simple .py ("Hello World") via job manager. Working fine.
(2) Run fluent journal via job manager. Working fine.
(3) Run PyFluent script vis job manager. Getting this error (Fluent is not launching - s = pyfluent.launch_fluent(precision="double",additional_arguments=parOpt,mode="solver"))

Number of machines = 1
Number of cores = 1
Min number of cores = 1
Max number of cores = 1
Parallel options for fluent are = -t1 -cnf=CDCFLUENTW16V01:1
Traceback (most recent call last): File "run.py", line 14, in s = pyfluent.launch_fluent(precision="double",additional_arguments=parOpt,mode="solver") File "c:\shared\pyfluent\pyfluent\src\ansys\fluent\core\launcher\launcher.py", line 541, in launch_fluent _await_fluent_launch(server_info_filepath, start_timeout, sifile_last_mtime) File "c:\shared\pyfluent\pyfluent\src\ansys\fluent\core\launcher\launcher.py", line 328, in _await_fluent_launch raise RuntimeError("The launch process has been timed out.")RuntimeError: The launch process has been timed out.

📝 Steps to reproduce

CDCFLUENTW16V01 is the cluster head-node. (Added Dan, Sean and Mainak's ids to access this cluster)
CDCFLUENTW16V02 the child-node.
Fluent is install on the head-node (CDCFLUENTW16V01) in the C:\shared\231\ANSYS Inc folder.
PyFluent (using Ansy CPython venv) is installed in the C:\shared\pyfluent\pyfluent folder.

To reproduce:

  • Login to the head-node and put attached files in your working area (unzip).
  • files.zip
  • Start HPC Job manager and run the run.py file using PyFluent (job submission). Getting this error. (PYFLUENT_SHOW_SERVER_GUI=0 or 1, same problem)
  • Adding os.environ['FLUENT_LM_CHECK_DISABLE'] = '1' in the run.py is also not helping.
  • additional_arguments="-ptrace" is writing mpt* log files without job scheduler. With job scheduler, Fluent is not being launched.
  • It looks like that Fluent is not launching with PyFluent + Job Scheduler

image

Fluent is working fine,

image

💻 Which operating system are you using?

Windows

🐍 Which Python version are you using?

3.7

📦 Installed packages

C:\shared\pyfluent\Scripts>C:\shared\pyfluent\Scripts\python.exe -m pip freeze
ansys-api-fluent==0.3.1
ansys-api-platform-instancemanagement==1.0.0b3
-e git+https://github.com/pyansys/pyfluent.git@33853f385c1fdce40a0525fce6240388e8b55d55#egg=ansys_fluent_core
ansys-platform-instancemanagement==1.0.2
appdirs==1.4.4
googleapis-common-protos==1.56.4
grpcio==1.49.1
h5py==3.7.0
importlib-metadata==4.12.0
numpy==1.21.6
packaging==21.3
pandas==1.3.5
Pint==0.18
protobuf==3.20.2
protoc-gen-swagger==0.1.0
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.2.1
six==1.16.0
typing_extensions==4.3.0
zipp==3.8.1

C:\shared\pyfluent\Scripts>
@ypatel-qa ypatel-qa added the bug Issue, problem or error in PyMAPDL label Sep 30, 2022
@ypatel-qa
Copy link
Contributor Author

@dnwillia-work - as per our Teams chat conversation.

@dnwillia-work dnwillia-work self-assigned this Oct 3, 2022
@dnwillia-work dnwillia-work linked a pull request Oct 3, 2022 that will close this issue
@dnwillia-work
Copy link
Collaborator

dnwillia-work commented Oct 3, 2022

So, one problem is that the windows HPC job manager returns host names as upper case and the Python socket library gethostname() call returns the host name in lower case. This would bypass the filtering to drop the -cnf argument in the local parallel use case which Fluent does not seem to like having it passed. I've worked around that and now Fluent is starting up.

That said, it also seems to be the case that Fluent will not spin up properly unless you also pass -gu -driver null using additional_arguments for launch_fluent. Still need to look into that.

@ypatel-qa
Copy link
Contributor Author

ypatel-qa commented Oct 3, 2022

@dnwillia-work

Hi Dan,

Your change has definitely resolving the initial Fluent start-up problem via Job scheduler. Thank you. I am summarizing my findings in three points.

  1. I confirm the "g" (-gu) mode issue you are experiencing. GitHub issue 967. This is reproducible outside of the job scheduler.
  2. With job scheduler in shared memory (when all the Fluent processes spawning on the head-node), now Fluent is starting but hanging after start-up. This looks like the GitHub issue 967 and I see cortex/fluent processes running in the task manager.
  3. With job scheduler in distributed memory (when some Fluent processes are running on the head-node, and some on the child-node), Fluent is still not starting. Error output is,
    image

@ypatel-qa
Copy link
Contributor Author

We will revisit this issue after #967 is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue, problem or error in PyMAPDL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants