Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?") #1831

Closed
timvw opened this issue Nov 9, 2023 · 3 comments · Fixed by #1885
Closed
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Milestone

Comments

@timvw
Copy link
Contributor

timvw commented Nov 9, 2023

Environment

Delta-rs version: 0.16.3

Binding: rust

Environment:

  • OS: macos

Bug

What happened:
Error NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?")

What you expected to happen:
No such error because the directory exists and has all (required) data. Reading just works (tm) when using apache spark or when using an earlier delta-rs version

How to reproduce it:

Simply try to load the table using the simple api:

deltalake::open_table("file:///Users/tim/src/github/qv/Users/tim/src/github/qv/testing/data/delta/COVID-19_NYT").await?

More details:

I have this (valid) delta table available in a git repository: https://github.com/timvw/arrow-testing/tree/master/data/delta/COVID-19_NYT.

It does not contain a snaphot, it does contain a valid version 0.json file

➜  ls -lar /Users/tim/src/github/qv/testing/data/delta/COVID-19_NYT
total 12120
-rw-r--r--@  1 tim  staff  325440 Jun 26 21:00 part-00007-4582392f-9fc2-41b0-ba97-a74b3afc8239-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  883342 Jun 26 21:00 part-00006-d0ec7722-b30c-4e1c-92cd-b4fe8d3bb954-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  895702 Jun 26 21:00 part-00005-4d47f8ff-94db-4d32-806c-781a1cf123d2-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  890662 Jun 26 21:00 part-00004-1bb9c3e3-c5b0-4d60-8420-23261f58a5eb-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  879637 Jun 26 21:00 part-00003-539aff30-2349-4b0d-9726-c18630c6ad90-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  838642 Jun 26 21:00 part-00002-8826af84-73bd-49a6-a4b9-e39ffed9c15a-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  786636 Jun 26 21:00 part-00001-9d9d980b-c500-4f0b-bb96-771a515fbccc-c000.snappy.parquet
-rw-r--r--@  1 tim  staff  690424 Jun 26 21:00 part-00000-a496f40c-e091-413a-85f9-b1b69d4b3b4e-c000.snappy.parquet
drwxr-xr-x@  7 tim  staff     224 Jun 26 21:00 _delta_log
drwxr-xr-x@  3 tim  staff      96 Jun 26 21:00 ..
drwxr-xr-x@ 11 tim  staff     352 Jun 26 21:00 .
➜  ls -lar /Users/tim/src/github/qv/testing/data/delta/COVID-19_NYT/_delta_log 
total 24
-rw-r--r--@  1 tim  staff  5624 Jun 26 21:00 00000000000000000000.json
-rw-r--r--@  1 tim  staff    92 Jun 26 21:00 00000000000000000000.crc
-rw-r--r--@  1 tim  staff     0 Jun 26 21:00 .s3-optimization-2
-rw-r--r--@  1 tim  staff     0 Jun 26 21:00 .s3-optimization-1
-rw-r--r--@  1 tim  staff     0 Jun 26 21:00 .s3-optimization-0
drwxr-xr-x@ 11 tim  staff   352 Jun 26 21:00 ..
drwxr-xr-x@  7 tim  staff   224 Jun 26 21:00 .

Because there is no snapshot, the code ends up in get_latest_version
and tries to list files the files with _delta_log prefix

And there no files are found..

@timvw timvw added the bug Something isn't working label Nov 9, 2023
@timvw
Copy link
Contributor Author

timvw commented Nov 9, 2023

First thing I noticed is that version is set to -1 which results in a funky offset_path "_delta_log/-0000000000000000001.json"

But even when I update this function as following:

pub(crate) fn commit_uri_from_version(version: i64) -> Path {
    let versionToUse = if (version <= 0) { 0 } else { version };
    let version = format!("{versionToUse:020}.json");
    DELTA_LOG_PATH.child(version.as_str())
}

Still no files are found..

@rtyler rtyler added the binding/rust Issues for the Rust crate label Nov 15, 2023
@rtyler rtyler added this to the Rust v0.17 milestone Nov 15, 2023
dimonchik-suvorov pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 19, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
dimonchik-suvorov pushed a commit to dintegrity/delta-rs that referenced this issue Nov 19, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions open_table and open_table_with_version work)
dimonchik-suvorov pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 19, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
dimonchik-suvorov pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 19, 2023
# Description
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)

# Related Issue(s)
- closes delta-io#1831
@dimonchik-suvorov
Copy link
Contributor

Unfortunately, according to the test I've added I can't reproduce this issue... but this is my first Rust experience so perhaps I'm missing something...

@rtyler
Copy link
Member

rtyler commented Nov 19, 2023

Always nice to see a friendly face 👋 @dimonchik-suvorov! I had tried this out with a Python install of the deltalake crate and also could not reproduce the issue unfortunately 😦

@timvw There shouldn't need to be a snapshot in the directory, when I cloned your repo I was able to load successfully with our 0.13 Python deltalake release:

>>> from deltalake import DeltaTable
>>> dt = DeltaTable('./COVID-19_NYT')
>>> dt.files()
['part-00000-a496f40c-e091-413a-85f9-b1b69d4b3b4e-c000.snappy.parquet', 'part-00001-9d9d980b-c500-4f0b-bb96-771a515fbccc-c000.snappy.parquet', 'part-00002-8826af84-73bd-49a6-a4b9-e39ffed9c15a-c000.snappy.parquet', 'part-00003-539aff30-2349-4b0d-9726-c18630c6ad90-c000.snappy.parquet', 'part-00004-1bb9c3e3-c5b0-4d60-8420-23261f58a5eb-c000.snappy.parquet', 'part-00005-4d47f8ff-94db-4d32-806c-781a1cf123d2-c000.snappy.parquet', 'part-00006-d0ec7722-b30c-4e1c-92cd-b4fe8d3bb954-c000.snappy.parquet', 'part-00007-4582392f-9fc2-41b0-ba97-a74b3afc8239-c000.snappy.parquet']
>>> dt.schema()
Schema([Field(date, PrimitiveType("string"), nullable=True), Field(county, PrimitiveType("string"), nullable=True), Field(state, PrimitiveType("string"), nullable=True), Field(fips, PrimitiveType("integer"), nullable=True), Field(cases, PrimitiveType("integer"), nullable=True), Field(deaths, PrimitiveType("integer"), nullable=True)])
>>> df = dt.to_pandas()
>>> df
               date      county       state     fips  cases  deaths
0        2020-01-21   Snohomish  Washington  53061.0      1     0.0
1        2020-01-22   Snohomish  Washington  53061.0      1     0.0
2        2020-01-23   Snohomish  Washington  53061.0      1     0.0
3        2020-01-24        Cook    Illinois  17031.0      1     0.0
4        2020-01-24   Snohomish  Washington  53061.0      1     0.0
...             ...         ...         ...      ...    ...     ...
1111925  2021-03-11  Sweetwater     Wyoming  56037.0   3870    36.0
1111926  2021-03-11       Teton     Wyoming  56039.0   3418     9.0
1111927  2021-03-11       Uinta     Wyoming  56041.0   2087    12.0
1111928  2021-03-11    Washakie     Wyoming  56043.0    887    26.0
1111929  2021-03-11      Weston     Wyoming  56045.0    631     5.0

[1111930 rows x 6 columns]
>>>

dimonchik-suvorov pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 20, 2023
# Description
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)

# Related Issue(s)
- closes delta-io#1831
rtyler pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 30, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
rtyler pushed a commit to dimonchik-suvorov/delta-rs that referenced this issue Nov 30, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
rtyler pushed a commit that referenced this issue Nov 30, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
ion-elgreco pushed a commit to ion-elgreco/delta-rs that referenced this issue Dec 1, 2023
According to the issue test should fail to load table without snapshot (version 0) but test is written to test that it is possible to read and load Delta Table with version 0 into the Rust (functions `open_table` and `open_table_with_version` work)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
3 participants