Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: explicit python exceptions #1409

Merged
merged 9 commits into from
Jun 2, 2023
Merged

feat: explicit python exceptions #1409

merged 9 commits into from
Jun 2, 2023

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented May 29, 2023

Description

Working on an integration, I felt that our error handling in python was a bit opaque and handling errors on the python side would have to rely on inspecting the error message, which can easily silently break, if not covered by tests.

The proposed approach would be to introduce a few more python exceptions and - as we already do - rely on existing python exceptions where appropriate.

As this would be a bigger breaking change on the python side I guess, looking for some feedback, if you think this is worthwhile.

@wjones127 @fvaleye @rtyler

Related Issue(s)

Documentation

@github-actions github-actions bot added the binding/python Issues for the Python package label May 29, 2023
@wjones127
Copy link
Collaborator

My overall opinion on Python errors is that we should be mapping to standard Python exceptions in most cases, except:

  • TableNotFoundError: for when we can't find the Delta Table at a path
  • DeltaProtocolError: for when an operation was tried that isn't allowed by the delta protocol (e.g. trying to write to a delta table with a writer version we don't yet support)
  • CommitFailedError: When a commit write fails.

But most other things could be Python exceptions:

  • Tokio errors: RuntimeError
  • Chrono parse error: ValueError
  • Object store not found: FileNotFoundError
  • std::io::Error: IOError

The purpose of these errors should be making it as easy as possible for Python users to write error handling code. If I were making breaking changes to the Python exceptions, I would drop the generic PyDeltaTableError and only make custom errors for the specific cases that aren't already covered by built-in exceptions.

@roeap
Copy link
Collaborator Author

roeap commented May 30, 2023

My overall opinion on Python errors is that we should be mapping to standard Python exceptions in most cases

Completely agree! Previously had a look into the datafusion python repo, and they already did a lot of work mapping to python exceptions. We should be able to adopt much of that since our stacks overlap quite a bit ...

I also like the custom exceptions you proposed, along with being generally cautious of adding additional custom ones.

@roeap roeap marked this pull request as ready for review May 30, 2023 21:22
@roeap
Copy link
Collaborator Author

roeap commented May 30, 2023

@wjones127 - converted all the errors and did not find any that would warrant a new custom error type other then the ones you mentioned. At times I may have been too liberal with the DeltaProtocolError as I interpreted it as thing specified in the protocol. I "think" the mention to atomicity is alos in that document so e.g. the unsafe rename error is also a protocol error.

Main think I took away from this though is that we have to a) name / use errors more consistently throughout delta-rs, and b) simplify our errors quite a bit, but this goes beyond the scope of this PR ...

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup, especially the ergonomics of mapping errors to PythonError::from.

Some minor suggestions, but I think once @wjones127 approves this should be landed

python/deltalake/writer.py Outdated Show resolved Hide resolved
python/src/error.rs Outdated Show resolved Hide resolved
python/src/error.rs Outdated Show resolved Hide resolved
Comment on lines +43 to +47
ObjectStoreError::Generic { source, .. }
if source.to_string().contains("AWS_S3_ALLOW_UNSAFE_RENAME") =>
{
DeltaProtocolError::new_err(source.to_string())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.map_err(PyDeltaTableError::from_object_store)?;
.map_err(PythonError::from)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we implemented:

impl<T: Into<PythonError>> From<T> for PyErr { ... }

could we remove the map_err entirely?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea, and gave it a quick go, but the compiler is complaining that Into is not our trait, and neither is PyErr, so it won't allow it :(

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Besides existing comments had one thought about whether we could remove some more code.

@roeap roeap enabled auto-merge (squash) June 2, 2023 05:53
@roeap roeap merged commit fc61ea7 into delta-io:main Jun 2, 2023
@roeap roeap deleted the python-errors branch June 2, 2023 05:58
roeap added a commit to roeap/delta-rs that referenced this pull request Jun 2, 2023
Working on an integration, I felt that our error handling in python was
a bit opaque and handling errors on the python side would have to rely
on inspecting the error message, which can easily silently break, if not
covered by tests.

The proposed approach would be to introduce a few more python exceptions
and - as we already do - rely on existing python exceptions where
appropriate.

As this would be a bigger breaking change on the python side I guess,
looking for some feedback, if you think this is worthwhile.

@wjones127 @fvaleye @rtyler

<!---
For example:

- closes #106
--->

<!---
Share links to useful documentation
--->

---------

Co-authored-by: R. Tyler Croy <[email protected]>
roeap added a commit that referenced this pull request Jun 3, 2023
# Description

While doing #1409 it became evident, that our errors are somewhat
organically grown and could do with some pruning. At the same time we
are hoping to reorganize (#1136) delta-rs a bit, to make it easier to
reason about.

This PR is a first attempt to introduce a more explicit approach to how
we model our errors. Here I'd like to propose we work towards the
approach taken in the object store crate - specifically have very
specific errors in submodules, but do not surface those errors to the
user. Rather collapse them into eventually a single error type.

Since delta-rs does much more things then object store, I believe we
will eventually end up with some more top level errors, or have maybe a
two level hierarch.

As a first step in this PR:
- move `checkpoints.rs` into actions module (as proposed in #1136)
- move top level errors into new `errors` module - at lest the ones
defined in `delta.rs`
- try and consolidate `ActionError` and `CheckpointError` which are now
part of the `action` module.

As I needed to touch a bunch of imports anyhow, I took the liberty to
organize them according to what to the best of my knowledge is the
[leading
contender](https://rust-lang.github.io/rustfmt/?version=v1.4.38&search=#StdExternalCrate%5C%3A)
for what rust-fmt might do.

@wjones127 @rtyler @Blajda - opening this as a draft to get some
feedback, as this is a somewhat intrusive change to the overall crate,
in case you want to go another way.

# Related Issue(s)

part of: #1136

# Documentation

<!---
Share links to useful documentation
--->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants