Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters in partition path not handled locally #1299

Closed
zar3bski opened this issue Apr 18, 2023 · 0 comments · Fixed by #1661
Closed

Special characters in partition path not handled locally #1299

zar3bski opened this issue Apr 18, 2023 · 0 comments · Fixed by #1661
Labels
bug Something isn't working

Comments

@zar3bski
Copy link

Environment

Delta-rs version: 0.8.1

Binding:

Environment:

  • OS: Ubuntu 22.04.2 LTS
  • Python: 3.10.6

Bug

What happened:
Parquet file were not found

What you expected to happen:

I expected to_pandas to load the parquet file

How to reproduce it:

from deltalake import DeltaTable, write_deltalake
from pandas import DataFrame
df = DataFrame(
    [
        ["Pierre", "Python", 24, "R&D"], # special character: &
        ["David", "Python", 33, "R&D"],
        ["Cyril", "Typescript", 26, "R&D"],
        ["Marie", "Excel", 36, "Commerce"],
    ],
    columns=["prenom", "skill", "age", "department"],
)
write_deltalake("./test/tables/garbage.delta", df, partition_by=["department"])
dt = DeltaTable("./test/tables/garbage.delta")

dt.to_pandas()

More details:

Traceback (most recent call last):
  File "/home/zar3bski/Documents/Code/octaave/deltastic/test/minimally_reproductible.py", line 18, in <module>
    dt.to_pandas()
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 418, in to_pandas
    return self.to_pyarrow_table(
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 400, in to_pyarrow_table
    return self.to_pyarrow_dataset(
  File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/fs.py", line 22, in open_input_file
    return pa.PythonFile(DeltaFileSystemHandler.open_input_file(self, path))
deltalake.PyDeltaTableError: Object at location /home/zar3bski/Documents/Code/octaave/deltastic/test/tables/garbage.delta/department=R&D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet not found: No such file or directory (os error 2)
terminate called recursively
terminate called without an active exception
[1]    189090 IOT instruction (core dumped)  poetry run python test/minimally_reproductible.py

When I look in my project files, I find the file in test/tables/garbage.delta/department=R%2526D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet There seems to be a problem with the URL encoding of & that should not be handled as %2526 in a local context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant