-
Notifications
You must be signed in to change notification settings - Fork 224
Extract parts of datetime #433
Extract parts of datetime #433
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao I have added only month
and day
till now, wanted to get your feedback before I proceed further,
The idea is to combine DateLike components under one macro_rule, and TimeLike components on another.
src/compute/temporal.rs
Outdated
@@ -246,63 +228,90 @@ pub fn can_hour(data_type: &DataType) -> bool { | |||
) | |||
} | |||
|
|||
macro_rules! date_like { | |||
($component:ident, $array:ident, $data_type:path, $chrono_tz:ident) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having four arguments on a macro_rule, is definitely not very readable, I will try to bring this number down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to convert this to a generic with parameters A, D: DateLike F: Fn(D) -> A
, and pass |x| x.year()
, so that we do not rely on macros?
hour
and year
hour
and year
Codecov Report
@@ Coverage Diff @@
## main #433 +/- ##
==========================================
- Coverage 80.78% 80.00% -0.78%
==========================================
Files 372 371 -1
Lines 22651 22833 +182
==========================================
- Hits 18299 18268 -31
- Misses 4352 4565 +213
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good!
I am not sure ab out my comment, but I think it is worth a try to convert the macro to a generic with a function takes components out of the datelike object, but maybe it is not possible.
In general generics are more expressive and easier to read because they pinpoint the exact constraints (Trait) that are required to uphold the functionality.
src/compute/temporal.rs
Outdated
@@ -246,63 +228,90 @@ pub fn can_hour(data_type: &DataType) -> bool { | |||
) | |||
} | |||
|
|||
macro_rules! date_like { | |||
($component:ident, $array:ident, $data_type:path, $chrono_tz:ident) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to convert this to a generic with parameters A, D: DateLike F: Fn(D) -> A
, and pass |x| x.year()
, so that we do not rely on macros?
fb7849f
to
a5cfec4
Compare
@jorgecarleitao I quite wasn't able to reach I wasn't able to make trait bounds work on closures arguments, I tried I will also have to fix |
Let me investigate more and reduce the boilerplate. |
a5cfec4
to
f2df942
Compare
@jorgecarleitao Do you think Since at compile time we won't know if the input will be Also, this could be costly because it is applied to each element of the array, but I am not sure. Currently, we create one closure for each type with the help of pub fn hour(array: &dyn Array) -> Result<PrimitiveArray<u32>> {
match array.data_type() {
DataType::Date32 | DataType::Date64 | &DataType::Timestamp(_, None) => {
date_like(array, DataType::UInt32, |x| x.hour())
}
DataType::Time32(_) | DataType::Time64(_) => {
time_like(array, DataType::UInt32, |x| x.hour())
}
DataType::Timestamp(time_unit, Some(timezone_str)) => {
...
if let Ok(timezone) = timezone {
Ok(extract_impl(array, time_unit, timezone, |x| x.hour()))
} else {
chrono_tz(array, time_unit, timezone_str, |x| x.hour())
}
}
}
} |
Wow, really good changes. I think it is worth trying the |
f2df942
to
f63000d
Compare
@jorgecarleitao Unfortunately I had to go back to |
f63000d
to
d672229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You clearly enjoy writing simple, readable code in Rust =) I went through it and it is very pleasant to read it. I left minor suggestions, but feel free to iterate on it until you are comfortable.
One thing missing is the tests. We do not need to cover all variations since we are using macros to not duplicate code. IMO having tests for the timestamp with and without timezone is important, in particular
- extract
hour
from a timestamp on a summer time. See here for an example where there is a summer time shift. - extract
day
of a leap day (29th of Feb). See here for an example of one.
A note: formally second(date64)
should be 0. It is a bit misleading in the arrow specification, but date64
is a multiple of 86400000
(see here) 🤯. IMO we should enforce it on the array construction, so it is fine to keep it like this here.
d631187
to
c49451a
Compare
7db3763
to
1a86a0e
Compare
ae517e7
to
8576ec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @jorgecarleitao
nanosecond
method hasn't been properly tested yet,
Do you think we should add or remove any of the time extraction methods?
Let me know when you think it is ready (since it is still marked as a draft) |
@jorgecarleitao Was trying just now to add a test that uses TimeUnit::Nanosecond, I wanted to check for today's date represented by Nanoseconds (163265170202000000000), Int64Array is not enough, Int128Array does not support Datestamp logical type, so wasn't sure what to do I will try for Time64 with TimeUnit::Nanosecond |
hour
and year
hour
and year
hour
and year
Thank you again for this amazing work and PR 🙇 . I have modified the title and description to cover the great work done here so that anyone looking through the changelog can track it to this PR. |
This PR extends the datetime extract methods to support:
to
Date32
,Date64
,Time32(_)
andTime64(_)
,Timestamp(_, None)
andTimestamp(_, Some(_))
, thereby offering broad support to use dates, times and timestamps. For timestamps with timezone, these methods do take leap days (Feb 29th) and hours (summer time) into account.Closes #415