-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS: fix target file overwrite #1834
Conversation
* propagated overwrite information to HdfsWriter base trait * made child classes override this property * SequenceWriter factories now take an overwrite parameter * deleted existing file if overwrite is enabled in HdfsWriter#moveToTarget function
… in child classes for better code structure understanding.
At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user |
I ignored binary compatibility issues since it affects only internal APIs. |
Thanks for the PR. It looks good to me, and I really appreciate you looking into the broken atomicity and suggesting a fix for that as well. We are planning to release Alpakka 2.0 in a month or so. Some connectors have accumulated API breaking changes and thus the time for a major release. If you would be willing to upgrade this PR to use |
Should the migration to
So even if we migrate to |
A new ticket for migration to A new PR that does the migration then will close that new ticket and #1761 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you for fixing this! Keep the PRs coming. |
Fixes #1761
Purpose
Overwrite HDFS target file when the
overwrite
setting is set to true.Changes
Background Context
This is the most simple fix I could come up with but there are still some issues with the code imho:
moveToTarget
operation is ignored at https://github.com/akka/alpakka/blob/master/hdfs/src/main/scala/akka/stream/alpakka/hdfs/impl/HdfsFlowStage.scala#L144 . This results in silent errors which lead to the original issue. What would be the best approach to handle errors here?org.apache.hadoop.fs.FileSystem
does not provide arename
function that can take an overwrite flag or option. This is why I had to break the atomicity of therename
operation with adelete
.org.apache.hadoop.fs.FileContext
does provide such arename
method https://hadoop.apache.org/docs/r3.1.1/api/org/apache/hadoop/fs/FileContext.html#rename-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Options.Rename...- but using it would require to useFileContext
instead ofFileSystem
all the way around, and ultimately, break the public API.