Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Getting started guide, can't run on local cluster #2884

Closed
2 tasks done
tamis-laan opened this issue Sep 15, 2022 · 11 comments
Closed
2 tasks done

[BUG] Getting started guide, can't run on local cluster #2884

tamis-laan opened this issue Sep 15, 2022 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@tamis-laan
Copy link

Describe the bug

I'm trying out Flyte locally running through the getting started guide:
https://docs.flyte.org/en/latest/getting_started/index.html

The code runs great using pyflyte but doesn't work properly when running on the local demo cluster.

running:

> pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2

results in the error:

{"asctime": "2022-09-15 13:06:21,069", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.069160907+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-15 13:06:21,270", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.269916921+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
  File "/home/tux/.local/bin/pyflyte", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 539, in _run
    remote_entity = remote.register_script(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/remote/remote.py", line 596, in register_script
    upload_location, md5_bytes = fast_register_single_script(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 113, in fast_register_single_script
    upload_location = create_upload_location_fn(content_md5=md5)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
	debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-15T13:06:21.670827636+02:00", grpc_status:14}"

Expected behavior

The example code should run on the local cluster I created using:

> flytectl demo start

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@tamis-laan tamis-laan added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Sep 15, 2022
@welcome
Copy link

welcome bot commented Sep 15, 2022

Thank you for opening your first issue here! 🛠

@eapolinario
Copy link
Contributor

We are suspecting that it's an initialization issue. We'll investigate and comment here with what we discover.

@tamis-laan
Copy link
Author

@eapolinario
This is standing in the way of evaluating Flyte for our use case, is there a working version I can revert to?

@eapolinario
Copy link
Contributor

@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.

For example, after you run flytectl demo start are you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned after flytectl demo start finishes? (In other words, you should be seeing a file called ~/.flyte/config.yaml.

@eapolinario
Copy link
Contributor

@tamis-laan , also worth checking if you're setting the environment vars mentioned after flytectl demo start finishes. For example:

❯ flytectl demo start --source .
...
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-724hd                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-ln8s8                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml

@tamis-laan
Copy link
Author

@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.

For example, after you run flytectl demo start are you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned after flytectl demo start finishes? (In other words, you should be seeing a file called ~/.flyte/config.yaml.

The console is reachable but I don't have a ~/.flyte/config.yaml I do have a ~/.flyte/config-sandbox.yaml:

   admin:
     # For GRPC endpoints you might want to use dns:///flyte.myexample.com
     endpoint: localhost:30081
     authType: Pkce
     insecure: true
   logger:
     show-source: true
     level: 0

@tamis-laan
Copy link
Author

tamis-laan commented Sep 21, 2022

@tamis-laan , also worth checking if you're setting the environment vars mentioned after flytectl demo start finishes. For example:

❯ flytectl demo start --source .
...
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-724hd                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-ln8s8                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Pending       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx                      | Pending       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/tux/.flyte/config-sandbox.yaml

I have set both environment variables:

> echo $KUBECONFIG
/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
> echo $FLYTECTL_CONFIG
/home/tux/.flyte/config-sandbox.yaml

Still I het the same error:

{"asctime": "2022-09-21 10:10:10,341", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.341471522+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-21 10:10:10,542", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.542086665+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/bin/pyflyte", line 8, in <module>
    sys.exit(main())
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 542, in _run
    remote_entity = remote.register_script(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/remote/remote.py", line 600, in register_script
    upload_location, md5_bytes = fast_register_single_script(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 111, in fast_register_single_script
    upload_location = create_upload_location_fn(content_md5=md5)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
	debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-21T10:10:10.942963692+02:00", grpc_status:14}"

I have tested this on 3 linux machines myself. And a colleague has tested this on windows and has the same result.

@tamis-laan
Copy link
Author

@eapolinario

Is there a way to revert to a working version?

@eapolinario eapolinario self-assigned this Sep 23, 2022
@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Sep 23, 2022
@eapolinario
Copy link
Contributor

eapolinario commented Oct 5, 2022

We sync'd offline on this.

The tldr is that the default DNS resolver used by the python grpc client (C-ares according to the docs) is unable to resolve the name localhost. Forcing the client to use the OS's native dns resolver (by setting the environment variable GRPC_DNS_RESOLVER=native) unblocks the issue, although it's still unclear why c-ares was failing.

This issue in the grpc repo suggests that we collect logs by increasing the verbosity and tracing certain components.

@github-actions
Copy link

github-actions bot commented Sep 4, 2023

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Sep 4, 2023
@github-actions
Copy link

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2023
@eapolinario eapolinario reopened this Nov 2, 2023
@github-actions github-actions bot removed the stale label Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants