Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for COPY command #9931

Merged
merged 4 commits into from
Apr 7, 2024
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 3, 2024

Which issue does this PR close?

closes #9927

Rationale for this change

Looks like I missed a spot while updating the docs in #9754

What changes are included in this PR?

  1. Remove old options from COPY write options documentation
  2. Update COPY

Are these changes tested?

CI doc checks

Are there any user-facing changes?

Docs

@alamb alamb added the documentation Improvements or additions to documentation label Apr 3, 2024
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Apr 3, 2024
OPTIONS(
NULL_VALUE 'NAN'
);
my_table(a bigint, b bigint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the existing formatting hard to read, so I added some whitespace

)
```

In this example, we write the entirety of `source_table` out to a folder of parquet files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet options which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.

## Available Options

### COPY Specific Options

The following special options are specific to the `COPY` command.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options are now specified directly in the DML syntax itself, so I removed them from here

@@ -35,8 +35,22 @@ TO '<i><b>file_name</i></b>'
[ OPTIONS( <i><b>option</i></b> [, ... ] ) ]
</pre>

`STORED AS` specifies the file format the `COPY` command will write. If this
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported / reworded this content from write options page

format parquet,
compression snappy,
'compression::col1' 'zstd(5)',
partition_by 'column3, column4'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct format? Based on #9927 the partition_by has moved to the DML and it should be something like: COPY t1 TO '/tmp/hive_output/' PARTITIONED BY (col1) OPTIONS (format parquet);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is an excellent point -- I fixed it in af55db8 (I also tested that it works locally):

❯ create table source_table as values ('1','2','3','4');
0 row(s) fetched.
Elapsed 0.021 seconds.

❯ COPY source_table
  TO 'test/table_with_options'
  PARTITIONED BY (column3, column4)
  OPTIONS (
    format parquet,
    compression snappy,
    'compression::column1' 'zstd(5)',
  )
;
+-------+
| count |
+-------+
| 1     |
+-------+

@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2024

Thank you for the review @hveiga 🙏

Copy link
Contributor

@hveiga hveiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thank you for these changes!

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb and @hveiga

@comphead comphead merged commit 85b4e40 into apache:main Apr 7, 2024
4 checks passed
@alamb alamb deleted the alamb/fix_docs2 branch April 7, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COPY fails on cli with Invalid statement
3 participants