Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes BmiClientDocker construction hangs #16

Open
sverhoeven opened this issue Sep 6, 2018 · 4 comments
Open

Sometimes BmiClientDocker construction hangs #16

sverhoeven opened this issue Sep 6, 2018 · 4 comments
Labels
bug Something isn't working

Comments

@sverhoeven
Copy link
Member

When doing:

from grpc4bmi.bmi_client_docker import BmiClientDocker
model = BmiClientDocker(image='ewatercycle/walrus-grpc4bmi', image_port=55555)

Sometimes the intepreter hangs, when killed the stack trace is

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-6-b9bed7de7b73> in <module>()
      2 model = BmiClientDocker(image='ewatercycle/walrus-grpc4bmi', image_port=55555,
      3                         input_dir="./input",
----> 4                         output_dir="./output")

/usr/local/lib/python3.5/dist-packages/grpc4bmi/bmi_client_docker.py in __init__(self, image, image_port, host, input_dir, output_dir, user)
     40                                                remove=True,
     41                                                detach=True)
---> 42         super(BmiClientDocker, self).__init__(BmiClient.create_grpc_channel(port=port, host=host))
     43 
     44     def __del__(self):

/usr/local/lib/python3.5/dist-packages/grpc4bmi/bmi_grpc_client.py in __init__(self, channel, timeout, stub)
     27             self.stub = bmi_pb2_grpc.BmiServiceStub(c)
     28             future = grpc.channel_ready_future(c)
---> 29             future.result(timeout=timeout)
     30         else:
     31             self.stub = stub

/usr/local/lib/python3.5/dist-packages/grpc/_utilities.py in result(self, timeout)
    132 
    133     def result(self, timeout=None):
--> 134         self._block(timeout)
    135         return None
    136 

/usr/local/lib/python3.5/dist-packages/grpc/_utilities.py in _block(self, timeout)
     78                 else:
     79                     if until is None:
---> 80                         self._condition.wait()
     81                     else:
     82                         remaining = until - time.time()

/usr/lib/python3.5/threading.py in wait(self, timeout)
    291         try:    # restore state no matter what (e.g., KeyboardInterrupt)
    292             if timeout is None:
--> 293                 waiter.acquire()
    294                 gotit = True
    295             else:

KeyboardInterrupt: 
@sverhoeven sverhoeven added the bug Something isn't working label Sep 6, 2018
@sverhoeven
Copy link
Member Author

Adding a time.sleep(1) after the container is started I was able to connect without hangs for 20+ attempts while without sleep after about 5 attempts it hung.

I think the channel_ready_future is called when the container is not yet up. Waiting a bit gives the container enough time to start up.

Sleep is a bit hacky I propose we wait until the container is running or even healthy using the healthcheck.

@goord
Copy link
Collaborator

goord commented Sep 6, 2018

Yeah I had a sleep there originally but I removed it to speed up the test suite.

@sverhoeven
Copy link
Member Author

Instead of sleep used

        self.container.reload()
        while self.container.status is 'created':
            time.sleep(0.1)
            self.container.reload()

The status is running after the while loop.

Still hangups occur.

@sverhoeven
Copy link
Member Author

When grpc.channel_ready_future is removed the I get

_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "Socket closed"
	debug_error_string = "{"created":"@1536920082.125454765","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket closed","grpc_status":14}"

on calling the initialize method.

sverhoeven added a commit that referenced this issue Sep 14, 2018
@bpmweel bpmweel changed the title Sometimes BmiClientDocker contruction hangs Sometimes BmiClientDocker construction hangs Feb 18, 2019
sverhoeven added a commit that referenced this issue Mar 29, 2019
Less likely to become stuck

Refs #16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants