-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13628: [Format][C++][Java] Add MONTH_DAY_NANO interval type #10177
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format?
or
See also: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start. Can you nix the submodule update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @emkornfield
I did some more digging around and one thing I noticed is that the INTERVAL_TYPE defined in parquet reference uses a similar construction with three fields
It uses 4 bytes to represent month, 4 bytes to represent days, and 4 bytes to represent milliseconds rather than 8 bytes to represent nanoseconds.
I know I was lobbying for nanosecond precision intervals, but given postgres and parquet both use millisecond precision, perhaps it would be best if Arrow followed the same model?
I thought postgres used Microseconds not Milliseconds. This is why the postgres struct there was 8 bytes for seconds/subseconds. I actually liked the argument for having the finest grained available data available of other Arrow temporal types. I'm not sure any alignment is going to be perfect. Using this type in language specific concepts is already going to require a special type (e.g. there is nothing that seems to include all the fields together in the survey). |
Makes sense to me |
I believe this would also imply a minor bump to the Arrow columnar format version? (1.0.0 -> 1.1.0)? From a quick glance it seems forwards compatibility should be ok (I think you'll get |
Yes I added a comment above with version history. We should discuss on the ML, but I think it makes sense to release as soon as changes are accepted for the format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks a lot, @emkornfield !
The Parquet reference includes this paragraph:
Should something similar be added to clarify that eg the nanoseconds field is not limited to 1 day? |
@jorisvandenbossche good point. Added some text, let me know if it makes sense. |
Looks good! |
Was this voted on already? I forget. |
It was not, sorry, I've been delayed in the implementation which I think is
a prerequisite for a vote.
…On Tue, Jun 1, 2021 at 9:23 AM Antoine Pitrou ***@***.***> wrote:
Was this voted on already? I forget.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10177 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEIKYDSIFY66ZYYPSS7RUODTQUCR5ANCNFSM43VPY5AA>
.
|
Are you planning to write the C++ and Java implementations @emkornfield ? |
Yes still planning on it. Hopefully will have some time this week. |
Co-authored-by: Jacob Quinn <[email protected]>
@liyafan82 @pitrou thank you for the reviews, I think I addressed the relevant comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more comment. Thank you for the updates!
cpp/src/arrow/testing/random.cc
Outdated
|
||
void rand_day_millis(int64_t N, std::vector<DayTimeIntervalType::DayMilliseconds>* out) { | ||
const int random_seed = 0; | ||
std::default_random_engine gen(random_seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using pcg32_fast
in RandomArrayGenerator, use it here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
PeriodDuration pd1 = new PeriodDuration(Period.of(1, 2, 3), Duration.ofNanos(123)); | ||
PeriodDuration pdEq1 = new PeriodDuration(Period.of(1, 2, 3), Duration.ofNanos(123)); | ||
PeriodDuration pd2 = new PeriodDuration(Period.of(1, 2, 3), Duration.ofNanos(12)); | ||
PeriodDuration pd3 = new PeriodDuration(Period.of(1, 2, 0), Duration.ofNanos(123)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need some cases for which the month/day/nano are negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Thanks for your update. |
Are the Rust integration failures expected? the CI builds are green on git master. |
@jorgecarleitao investigated and did not think it was related to this change but rather a latent bug. |
@pitrou @liyafan82 any more feedback? Otherwise I'd like to get this merged so I can start on the follow-up items from the vote (I'll merge end of week unless there are more comments or we have more concerns about the integration tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you
The Java changes LGTM. Good job! @emkornfield |
Thanks for the reviews. merging. |
Trying to formalize [mailing list discussion](https://lists.apache.org/thread.html/rd919c4ed8ad2f2827a2d4f665d8da99e545ba92ef992b2e557831751%40%3Cdev.arrow.apache.org%3E) Closes apache#10177 from emkornfield/interval Lead-authored-by: Micah Kornfield <[email protected]> Co-authored-by: emkornfield <[email protected]> Co-authored-by: emkornfield <[email protected]> Signed-off-by: Micah Kornfield <[email protected]>
#10177 Added the interval type Month, Day, Nano to the flatbuffer and C++/Java, this updates the flatbuffer generated files and adds support for the type to Go. Closes #11310 from zeroshade/arrow-13804-month-day-nano Authored-by: Matthew Topol <[email protected]> Signed-off-by: Matthew Topol <[email protected]>
Trying to formalize mailing list discussion