Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino Iceberg s3 path handling incompatibility when using s3a:// or s3n:// paths with // #23097

Open
sopel39 opened this issue Aug 21, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@sopel39
Copy link
Member

sopel39 commented Aug 21, 2024

Introduction:

  • Trino legacy S3 FS converts // -> / for s3a/s3n paths, but preserves // for s3 paths
  • Trino native S3 FS does preserve // for all s3/s3a/s3n
  • Other engines, e.g: Spark do preserve // for s3/s3a/s3n paths
  • Iceberg catalog stores location of metadata file (with //) as part of table information

Trino breaking behaviour

  1. Create table in Trino with legacy FS:
trino> create table iceberg.schema.table (x bigint) with (location='s3a://bucket/schema//table')
CREATE TABLE
trino> show create table iceberg.schema.table;
                                               Create Table
----------------------------------------------------------------------------------------------------------
 CREATE TABLE iceberg.schema.table (
    x bigint
 )
 WITH (
    format = 'PARQUET',
    format_version = 2,
    location='s3a://bucket/schema//table')
 )
  1. Query with native FS
trino> select * from iceberg.schema.table;
...
Caused by: java.io.FileNotFoundException: s3a://bucket/schema//table/metadata/00002-279c8ac6-0628-4687-aeb5-ef58b53d3b4a.metadata.json
	at io.trino.filesystem.s3.S3InputStream.seekStream(S3InputStream.java:199)
  1. Query data with other engines, e.g: spark:
spark-sql ()> select * from iceberg.schema.table;
24/08/21 13:35:46 ERROR SparkSQLDriver: Failed in [select * from iceberg.schema.table]
org.apache.iceberg.exceptions.ServiceFailureException: Server error: NotFoundException: Location does not exist: s3a://bucket/schema//table/metadata/00002-279c8ac6-0628-4687-aeb5-ef58b53d3b4a.metadata.json

Problem

  • How to preserve forward/backward compatibility when native FS is finally turned on assuming it's HARD to rewrite existing, potentially broken data?

Solution proposals

  1. Make native FS behave like legacy FS with regards to s3a/s3n (under feature flag)?
  2. Try to open both locations (with/without //), but write only with // for both legacy and native?

cc @electrum @osscm @rstyp

@findepi
Copy link
Member

findepi commented Aug 21, 2024

There was a problem with // in paths with resulted in both (a) // being replaced with / + (b) url-encoded path being appended to the path. It looks like this is a different problem, but would be great if you could confirm.

@findepi
Copy link
Member

findepi commented Aug 22, 2024

#17803 might be the issue, but not sure, it does not have examples.

@sopel39
Copy link
Member Author

sopel39 commented Aug 22, 2024

#17803 might be the issue, but not sure, it does not have examples.

Seems like similar issue. #17958 doesn't help if other engine creates Hive such table with //. Anyway, mine issue is more about Iceberg than Hive.

@findepi findepi changed the title Trino Iceberg s3 path handling incompatibility Trino Iceberg s3 path handling incompatibility when using s3a:// or s3n:// paths with // Aug 22, 2024
@findinpath
Copy link
Contributor

@sopel39 is this issue reproducible with Trino 454 ?

@sopel39
Copy link
Member Author

sopel39 commented Aug 23, 2024

@findinpath yes, the issue is reproducible since when io.trino.filesystem.hdfs.HadoopPaths#hadoopPath was introduced, so many years backward.

We are actually leaning towards creating an Iceberg table property, e.g: native_fs_legacy_path_mode, which would remove // from S3 paths for selected tables when native FS is used. This would

  1. allow to access "broken" tables from native FS
  2. make new Iceberg correct going forward.

@osscm
Copy link
Contributor

osscm commented Aug 23, 2024

Yes, we need this, as soon legacy fs will be out, so we need way to read the broken tables from the native.

We are also leaning to keep this mode/functionality exactly same as legacy fs.
Where it is handling only s3 for // and not s3a or s3n

@zhaner08
Copy link
Contributor

we definitely have people relying on the legacy behavior, at least a flag to support the behavior if we dont want to make it as a default

@findinpath
Copy link
Contributor

findinpath commented Aug 30, 2024

We went through with the investigation on this one (although it took a longer time to get to look deeper into it).
apologies for the delay in replying.

tldr;

Moving on the object storage the files from s3://bucket/double/slashes/table1 to s3://bucket/double//slashes/table1 (notice / ->// before slashes) fixes the problem and corresponds to the location of the files linked from the metadata and manifest files in the Iceberg tables.

What's left to do is to adjust HadoopPaths in legacy fs to support not only s3 , but also s3a & s3n

if ("s3".equals(hadoopPath.toUri().getScheme()) && !path.equals(hadoopPath.toString())) {

We'll follow-up soon with a PR and we'll work with @mosabua on documenting best a "fix" plan for this unfortunate hick-up.

Why is "move" (copy to new location & delete old files) in detriment of a backward-compatibility table property being suggested?

The table created on s3a or s3n without properly encoding the paths is effectively broken because the it has metadata files pointing to invalid locations.
I see no reason to bend the trinodb/trino s3 native file system to deal with broken tables.

@mosabua
Copy link
Member

mosabua commented Aug 30, 2024

Why is "move" (copy to new location & delete old files) in detriment of a backward-compatibility table property being suggested?

The table created on s3a or s3n without properly encoding the paths is effectively broken because the it has metadata files pointing to invalid locations. I see no reason to bend the trinodb/trino s3 native file system to deal with broken tables.

I think we can not force users to perform this move when the migrate from the legacy file system to the new file system support. After all they used Trino in the first place to get the tables in place. It potentially comes with large cost and risk and we should offer a way for users to migrate to the new file system without a move.

@mosabua
Copy link
Member

mosabua commented Aug 30, 2024

Also we are actively discussing this on slack and it will take some time to make a decision together with @electrum and others.

https://trinodb.slack.com/archives/C07ABNN828M/p1724946906029559

@mayankvadariya
Copy link
Contributor

mayankvadariya commented Sep 6, 2024

Meeting minutes with @mosabua @electrum @findinpath @mayankvadariya

We've decided to add a Trino Iceberg table migration procedure to update table metadata/manifests to contain the correct S3 path so that the table will be queryable in Native file system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

7 participants