Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_deltalake fails on Databricks volume #2540

Closed
Bernolt opened this issue May 26, 2024 · 3 comments
Closed

write_deltalake fails on Databricks volume #2540

Bernolt opened this issue May 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Bernolt
Copy link
Contributor

Bernolt commented May 26, 2024

Environment

Delta-rs version: 0.17.4

Binding: python (pyarrow engine)

Environment:

  • Cloud provider: Azure
  • OS:
  • Other: Databricks runtime 13.3 LTS

Bug

What happened:
From a python application running on a Databricks cluster, I want to write to an append-only delta table.
The function is called as follows:

write_deltalake(
      data=arrow_table,
      table_or_uri="/Volume/catalog/schema/volume_path/table_path",
      mode="append",
      overwrite_schema=False)

However, I am getting the below error:

OSError: Generic LocalFileSystem error: Unable to copy file from /Volumes/catalog/schema/volume_path/table_path/_delta_log/_commit_e964ab56-f56c-403a-b06d-fe2b6bcabf9d.json.tmp to /Volumes/catalog/schema/volume_path/table_path/_delta_log/00000000000000000000.json: Function not implemented (os error 38)

What you expected to happen:
As Databricks supports copy/rename/delete operations, I would expect it to work.
As far as I know Databricks use a Local File System API, which emulates a filesystem on top of a cloud storage.

How to reproduce it:
I made the below notebook to reproduce the error. It needs to be run from a Databricks Runtime.

# Databricks notebook source
# MAGIC %sh
# MAGIC touch /Volumes/catalog/schema/volume/table_path/to_rename.tmp

# COMMAND ----------

# MAGIC %sh
# MAGIC mv /Volumes/catalog/schema/volume/table_path/to_rename.tmp /Volumes/catalog/schema/volume/table_path/renamed.todelete

# COMMAND ----------

# MAGIC %sh 
# MAGIC rm /Volumes/catalog/schema/volume/table_path/renamed.todelete

# COMMAND ----------

from deltalake import write_deltalake
import pyarrow as pa

# COMMAND ----------

arrow_table = pa.table([
    pa.array([2, 4, 5, 100]),
    pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
    ], names=['n_legs', 'animals'])

# COMMAND ----------

write_deltalake(table_or_uri = "/Volumes/catalog/schema/volume/table_path/reproduce_deltars_error_table_01",
                data = arrow_table,
                mode = "append",
                overwrite_schema=False)

More details:

@Bernolt Bernolt added the bug Something isn't working label May 26, 2024
@Bernolt
Copy link
Contributor Author

Bernolt commented May 26, 2024

It might not be a bug from delta-rs perspective, however, it would be helpful to have some insights on the underlying file system operation performed.

@ion-elgreco
Copy link
Collaborator

Afaik, databricks volumes are fuse mounted, so this is not an bug. If you want to write to mounted storage that doesn't support CopyIfNotExists, you can pass this to the writer:

storage_options = {"allow_unsafe_rename": "true"}

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2024
@Bernolt
Copy link
Contributor Author

Bernolt commented May 26, 2024

Thanks, solved my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants