Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify Azurite hostname and service port as backend #2900

Closed
g0di opened this issue Sep 23, 2024 · 4 comments
Closed

Allow to specify Azurite hostname and service port as backend #2900

g0di opened this issue Sep 23, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@g0di
Copy link

g0di commented Sep 23, 2024

The library allows to use Azurite as a valid backend for reading/writing delta files through a storage option AZURE_STORAGE_USE_EMULATOR="1". This option assumes that the Azurite server is running on 127.0.0.1 and that the blob service is listening to port 10000 which are the defaults when running Azurite locally. However, there is no way to use a different port nor hostname for Azurite.

Use Case

This is limitating when you run Azurite in a Docker compose and you want your Python application, running as another service in your compose file, to reach Azurite. Indeed, when your Python application runs through Docker, it tries to connect to Azurite using 127.0.0.1:10000 but this no longer work because this points to the python application container itself.

Instead, the Azurite service is available through an alias (which is the name of the service in the Docker compose file). I would like to be able to use the emulator and override the Azurite hostname (and eventually its port).

Note that I've tried to set the AZURE_STORAGE_ENDPOINT=http://azurite:10000/devstoreaccount1 as well but it has no effect. I think this variable is superseded whenever the AZURE_STORAGE_USE_EMULATOR is truthy

Example

Consider the following Docker compose file which starts a Python app trying to read a table from Azurite. Both Azurite and the deltalake client are in the same docker network.

name: deltalake-azurite-docker

services:
  app:
    build:
      context: .
      dockerfile_inline: |
        FROM python:3.11-slim
        RUN pip install deltalake
    command: python3 -c 'from deltalake import DeltaTable; DeltaTable("abfs://data/test.delta")'
    environment:
      AZURE_STORAGE_USE_EMULATOR: "1"
      AZURE_STORAGE_ENDPOINT": "http://azurite:10000/devstoreaccount1"
    depends_on:
      azurite:
        condition: service_healthy

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:latest
    ports:
      - 10000:10000
    healthcheck:
      test: nc 127.0.0.1 10000 -z
      interval: 1s
      retries: 30

  # Simple trick to create an empty container when starting azurite
  azurite-setup:
    image: mcr.microsoft.com/azure-cli:latest
    command: az storage container create --name data
    depends_on:
      azurite:
        condition: service_healthy
    environment:
      AZURE_STORAGE_CONNECTION_STRING: DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite:10000/devstoreaccount1;

Run it

docker compose up

You'll get the following error in logs

app-1            | Traceback (most recent call last):
app-1            |   File "<string>", line 1, in <module>
app-1            |   File "/usr/local/lib/python3.11/site-packages/deltalake/table.py", line 412, in __init__
app-1            |     self._table = RawDeltaTable(
app-1            |                   ^^^^^^^^^^^^^^
app-1            | OSError: Generic MicrosoftAzure error: Error after 10 retries in 2.224622054s, max_retries:10, retry_timeout:180s, source:error sending request for url (http://127.0.0.1:10000/devstoreaccount1/data/test.delta/_delta_log/_last_checkpoint)

As you can see, deltalake tries to connect to 127.0.0.1 instead of azurite.

@g0di g0di added the enhancement New feature or request label Sep 23, 2024
@g0di g0di changed the title Allow to use Azurite within Docker compose as backend Allow to specify Azurite hostname and service port as backend Sep 23, 2024
@ion-elgreco
Copy link
Collaborator

You should refer to the object store docs for this

@g0di
Copy link
Author

g0di commented Sep 23, 2024

You should refer to the object store docs for this

Thank you for your reply. I'm not sure to know what you're talking about. May you give me some more explanations?

thanks!

@VillePuuska
Copy link
Contributor

VillePuuska commented Sep 24, 2024

I think you need to set the env variable AZURITE_BLOB_STORAGE_URL: "http://azurite:10000/" for app. Adding just this changes the error message to

app-1  | Traceback (most recent call last):
app-1  |   File "<string>", line 1, in <module>
app-1  |   File "/usr/local/lib/python3.11/site-packages/deltalake/table.py", line 412, in __init__
app-1  |     self._table = RawDeltaTable(
app-1  |                   ^^^^^^^^^^^^^^
app-1  | OSError: Generic MicrosoftAzure error: Error performing list request: Client error with status 404 Not Found: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
app-1  | <Error>
app-1  |   <Code>ContainerNotFound</Code>
app-1  |   <Message>The specified container does not exist.
app-1  | RequestId:a6f3be17-1918-4ee7-b90d-66619356d6b9
app-1  | Time:2024-09-24T18:52:03.565Z</Message>
app-1  | </Error>

and adding a sleep to the Python script to wait for container creation changes the error message to

app-1  | Traceback (most recent call last):
app-1  |   File "<string>", line 1, in <module>
app-1  |   File "/usr/local/lib/python3.11/site-packages/deltalake/table.py", line 412, in __init__
app-1  |     self._table = RawDeltaTable(
app-1  |                   ^^^^^^^^^^^^^^
app-1  | _internal.TableNotFoundError: no log files

so seems like it's connecting properly.

No idea where this might actually be documented, but found it from object_store source https://docs.rs/object_store/latest/src/object_store/azure/builder.rs.html#827 and the integration test workflow

AZURITE_BLOB_STORAGE_URL: "http://localhost:10000"
😅

@g0di
Copy link
Author

g0di commented Sep 25, 2024

I've missed that configuration option! Thank you, it should do the trick.

@g0di g0di closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants