Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncChunkReader::get_bytes error: Generic MicrosoftAzure error: error decoding response body #2592

Closed
thomasfrederikhoeck opened this issue Jun 12, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@thomasfrederikhoeck
Copy link
Contributor

Environment

Delta-rs version: 0.18.1

Binding: Python

Environment:

  • Cloud provider: Azure
  • OS: WIndows
  • Other:

Bug

What happened:
After 0.18.1 was released it fixes the inital issue with #2301 for me but instead I started hitting this. The Z-order operations start and I can see that there is usage of network, CPU and memory but after 30 secs-ish I'm hit with the following. The Rust logs doens't show anything strange:

metrics = self.table._table.z_order_optimize(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_internal.DeltaError: Failed to parse parquet: Parquet error: Z-order failed while scanning data: ParquetError(General("AsyncChunkReader::get_bytes error: Generic MicrosoftAzure error: error decoding response body"))

What you expected to happen:
That the Z-order completes.

How to reproduce it:

import os
os.environ["RUST_LOG"]="debug"

from deltalake import DeltaTable
 
blob_path = "az://<redacted path>"
storage_options = {"AZURE_STORAGE_ACCOUNT_NAME": "<redacted sa>", "AZURE_CONTAINER_NAME":'<redacted container>', 'use_azure_cli': 'true'}

dt = DeltaTable(blob_path, storage_options=storage_options)
dt.optimize.z_order(["StatusDateTime"])

More details:

@abhiaagarwal
Copy link
Contributor

I'm reasonably confident the error is orignating from here, based on my read of various error messages:

while let Some(maybe_batch) = read_stream.next().await {
let mut batch = maybe_batch?;
batch = super::cast::cast_record_batch(
&batch,
task_parameters.file_schema.clone(),
false,
true,
)?;
partial_metrics.num_batches += 1;
writer.write(&batch).await.map_err(DeltaTableError::from)?;
}

Since it's run in a blocking context in the python side, I'm wondering if that's causing any weirdness (it shouldn't).

@thomasfrederikhoeck
Copy link
Contributor Author

@abhiaagarwal I wish I could assist but my Rust knowledge is very limited. But let me know if I need to test something.

@Josh-Hiz
Copy link

Josh-Hiz commented Jul 2, 2024

This same issue is happening occasionally when also reading from a deltatable in Azure Gen 2:

  File "pyarrow\\_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table
  File "pyarrow\\_dataset.pyx", line 3804, in pyarrow._dataset.Scanner.to_table
  File "pyarrow\\error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\\error.pxi", line 88, in pyarrow.lib.check_status
OSError: Generic MicrosoftAzure error: error decoding response body

In which to_table is causing this.

@thomasfrederikhoeck
Copy link
Contributor Author

@Josh-Hiz what happens if try benchmarking with azcopy like I have done here: apache/arrow-rs#5882 (comment) maybe you can add a data point as a comment?

@thomasfrederikhoeck
Copy link
Contributor Author

@Josh-Hiz what happens if try benchmarking with azcopy like I have done here: apache/arrow-rs#5882 (comment) maybe you can add a data point as a comment?

Very gentle ping @Josh-Hiz :-)

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jul 29, 2024

@thomasfrederikhoeck try to create a reproducible example that mimics the size and characteristics of your table on Azure. Otherwise no one can properly replicate

@ion-elgreco
Copy link
Collaborator

@thomasfrederikhoeck can you test in your environment if I provide you a custom wheel?

@thomasfrederikhoeck
Copy link
Contributor Author

@ion-elgreco if you have a branch I can build that and try tmw?

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Aug 14, 2024

@thomasfrederikhoeck https://github.com/ion-elgreco/delta-rs/tree/chore/fs_debug

Ah wait this is just for reading through pyarrow : P

@thomasfrederikhoeck
Copy link
Contributor Author

@ion-elgreco Hmm I'm not seeing that debug print you added if I run:

import os
os.environ["RUST_LOG"]="debug"
os.environ["RUST_BACKTRACE"]="1"

from deltalake import DeltaTable


blob_path = "az://<redacted>"
storage_options = {"AZURE_STORAGE_ACCOUNT_NAME": "<redacted>",
                   "AZURE_CONTAINER_NAME":'l<redacted>', 
                   'azure_use_azure_cli': 'true',

                   }
dt = DeltaTable(blob_path, storage_options=storage_options)
dt.optimize.z_order(["<redacted>"])

Do I need to build with certain maturin args?

@ion-elgreco
Copy link
Collaborator

@ion-elgreco Hmm I'm not seeing that debug print you added if I run:

import os
os.environ["RUST_LOG"]="debug"
os.environ["RUST_BACKTRACE"]="1"

from deltalake import DeltaTable


blob_path = "az://<redacted>"
storage_options = {"AZURE_STORAGE_ACCOUNT_NAME": "<redacted>",
                   "AZURE_CONTAINER_NAME":'l<redacted>', 
                   'azure_use_azure_cli': 'true',

                   }
dt = DeltaTable(blob_path, storage_options=storage_options)
dt.optimize.z_order(["<redacted>"])

Do I need to build with certain maturin args?

Yeah it's only added in reading through pyarrow dataset interface, the issues you see requires some refactoring in delta-rs 😞

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Aug 16, 2024

@thomasfrederikhoeck can you try this one please, it uses an additional runtime for the writing part of optimize: https://github.com/ion-elgreco/delta-rs/tree/fix/use_different_write_rt

@ion-elgreco
Copy link
Collaborator

@thomasfrederikhoeck I introduced a separate runtime for IO, can you try 0.19.1 please and let me know if things are improved

@thomasfrederikhoeck
Copy link
Contributor Author

It worked!! Nice job @ion-elgreco!

@ion-elgreco
Copy link
Collaborator

@thomasfrederikhoeck Yey :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants