-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/slice intersect multi series #2592
Feat/slice intersect multi series #2592
Conversation
0ae1729
to
b6f6812
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2592 +/- ##
==========================================
- Coverage 94.20% 94.16% -0.05%
==========================================
Files 141 141
Lines 15491 15501 +10
==========================================
+ Hits 14594 14596 +2
- Misses 897 905 +8 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @ymatzkevich, it looks really good already 🚀
Just had some minor suggestions here and there. After that we can merge
…s' into feat/slice_intersect_multi_series
…ect_multi_series Merge branch 'master' into feat/slice_intersect_multi_series
…s' into feat/slice_intersect_multi_series Merge remote changes.
In order to compare between the different options available to implement import time
import itertools
import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.timeseries import slice_intersect
from darts.utils.utils import generate_index
def helper_test_intersect(freq, is_mixed_freq: bool, N):
start = pd.Timestamp("20130101") if isinstance(freq, str) else 0
freq = pd.tseries.frequencies.to_offset(freq) if isinstance(freq, str) else freq
# handle identical and mixed frequency setup
if not is_mixed_freq:
freq_other = freq
n_steps = 11
elif "2" not in str(freq): # 1 or "1D"
freq_other = freq * 2
n_steps = 21
else: # 2 or "2D"
freq_other = freq / 2
n_steps = 11
freq_other = int(freq_other) if isinstance(freq_other, float) else freq_other
idx = generate_index(start=start, freq=freq, length=n_steps)
end = idx[-1]
# we construct 2 different series that will be used for the intersection
startA = start
endA = end
idxA = generate_index(startA, endA, freq=freq_other)
seriesA = TimeSeries.from_series(pd.Series(range(len(idxA)), index=idxA))
startB = start + freq
endB = startB + 6 * freq_other
idxB = generate_index(startB, endB, freq=freq_other)
seriesB = TimeSeries.from_series(pd.Series(range(len(idxB)), index=idxB))
iterations = 100 # to have a statistical sample from which we compute mean time
start_time = time.time()
for _ in range(iterations):
sequence = [seriesA, seriesB]*N
int_sequence = slice_intersect(sequence) # we do not need to use the intersected sequence, just to compute it for benchmarking
end_time = time.time()
time_taken = end_time - start_time
mean_time = time_taken/iterations
return mean_time
freq_list = ["D", "2D", 1, 2] # different types of frequencies
is_mixed_freq_list = [False, True] # mixed frequencies
N = 5 # determines size of sequence that we are intersecting
combinations = list(itertools.product(freq_list, is_mixed_freq_list)) # we test all combinations
length = len(combinations) # number of combinations
total_time = 0
for i, (freq, is_mixed_freq) in enumerate(combinations):
print(f"combination {i+1}/{length}: (freq,is_mixed_freq)=({freq},{is_mixed_freq})")
total_time += helper_test_intersect(freq, is_mixed_freq, N)
mean_time = total_time / length
print(f"N={N}, mean_time={mean_time}")
The results of the benchmark are shown here, in seconds:
While using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank a lot @ymatzkevich for this very nice PR and performance report 💯
Everything looked fine, I took the opportunity to make some minor adaptions.
Now it's ready to be merged 🚀
Checklist before merging this PR:
Fixes #2042.
Summary
The function
TimeSeries.slice_intersect()
(see documentation) allows to intersect aTimeSeries
with another one so that they end up with the same time indices. However, if one wants to intersect multiple series, that function would need to be called several times or the intersection would need to be done by hand using e.g.xarray
. The new functionslice_intersect()
introduced with this PR solves this issue for an arbitrary number ofTimeSeries
.Essentially, given a list of
TimeSeries
having the same time index type,slice_intersect()
will output the aligned list meaning that allTimeSeries
in it will have the same start and end time (if the intersection exists).Other Information
If the given
TimeSeries
do not have all the same time index type (e.g. some have aRangeIndex
and someDateTimeIndex
), the function will raise an error.