Lag error on ks_test #43

astanway · 2013-08-18T19:20:22Z

I occasionally get this error:

ERROR:root:Algorithm error: Traceback (most recent call last):
  File "/home/astanway/skyline/src/analyzer/algorithms.py", line 263, in run_selected_algorithm
    ensemble = [globals()[algorithm](timeseries) for algorithm in ALGORITHMS]
  File "/home/astanway/skyline/src/analyzer/algorithms.py", line 206, in ks_test
    adf = sm.tsa.stattools.adfuller(reference, 10)
  File "/usr/lib64/python2.6/site-packages/statsmodels/tsa/stattools.py", line 201, in adfuller
    xdall = lagmat(xdiff[:,None], maxlag, trim='both', original='in')
  File "/usr/lib64/python2.6/site-packages/statsmodels/tsa/tsatools.py", line 305, in lagmat
    raise ValueError("maxlag should be < nobs")
ValueError: maxlag should be < nobs

Any clues? cc @mabrek

Re: f886000

The text was updated successfully, but these errors were encountered:

mabrek · 2013-08-19T09:11:52Z

The error means that there is not enough datapoints for the test. What resolution (interval between observations) do you use?
I used this test for metrics with 2 seconds resolution. Initially it was 1s but I found quite high sampling jitter caused by linux kernel vm stats update interval (which is 1s).

astanway · 2013-08-19T12:24:51Z

I use a 10 second resolution, with lots of variation in overall sample size. Is there a hard number on the minimum datapoints needed for this statistic?

mabrek · 2013-08-19T12:29:05Z

Yes, there is a hard limit of 10 datapoints in reference part (between hour and 10 minutes ago).
adf = sm.tsa.stattools.adfuller(reference, 10)
I can guard it by 'if' condition.

astanway · 2013-08-19T12:31:27Z

Ah, I see - yeah, I think a conditional there would be safer.

On Mon, Aug 19, 2013 at 8:29 AM, Anton Lebedevich
[email protected]:

Yes, there is a hard limit of 10 datapoints in reference part (between
hour and 10 minutes ago).
adf = sm.tsa.stattools.adfuller(reference, 10)
I can guard it by 'if' condition.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-22868159
.

Abe Stanway
abe.is

mabrek · 2013-08-20T08:15:13Z

I've added conditional.

As a side note there might be some confusion in the way algorithms select data range to check for anomalies.

Checking last N datapoints gives different results on metrics with different resolutions. If anomaly is detected on 1 last datapoint or even 3 last datapoints on a metric with 2 seconds resolution that anomaly might disappear in 10 seconds. If metric has resolution of 5 minutes then there is quite a lot of time for human to notice detected anomaly.

Checking last N minutes would not provide enough datapoints for some algorithms (like ks_test) on low resolution metrics.

astanway · 2013-08-20T11:41:19Z

That is correct. Perhaps a new setting is needed - TAIL_AVERAGE_SIZE?

On Aug 20, 2013, at 4:15 AM, Anton Lebedevich [email protected] wrote:

I've added conditional.

As a side note there might be some confusion in the way algorithms select data range to check for anomalies.

Checking last N datapoints gives different results on metrics with different resolutions. If anomaly is detected on 1 last datapoint or even 3 last datapoints on a metric with 2 seconds resolution that anomaly might disappear in 10 seconds. If metric has resolution of 5 minutes then there is quite a lot of time for human to notice detected anomaly.

Checking last N minutes would not provide enough datapoints for some algorithms (like ks_test) on low resolution metrics.

—
Reply to this email directly or view it on GitHub.

mabrek · 2013-08-20T12:19:19Z

Metrics with a different resolutions might be present in the same environment so single size won't fit them all. Maybe it's better to use time to cut tail off the sequence (TAIL_TIME)?

astanway · 2013-09-07T15:09:25Z

I'm going to close this out, but can you please raise another issue with a case for TAIL_TIME and pragmatic resolution checking?

mabrek mentioned this issue Aug 20, 2013

feed enough data into ks and adf tests #44

Merged

astanway closed this as completed Sep 7, 2013

mabrek mentioned this issue Sep 11, 2013

add TAIL_INTERVAL configuration setting to use in tail_avg #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lag error on ks_test #43

Lag error on ks_test #43

astanway commented Aug 18, 2013

mabrek commented Aug 19, 2013

astanway commented Aug 19, 2013

mabrek commented Aug 19, 2013

astanway commented Aug 19, 2013

mabrek commented Aug 20, 2013

astanway commented Aug 20, 2013

mabrek commented Aug 20, 2013

astanway commented Sep 7, 2013

Lag error on ks_test #43

Lag error on ks_test #43

Comments

astanway commented Aug 18, 2013

mabrek commented Aug 19, 2013

astanway commented Aug 19, 2013

mabrek commented Aug 19, 2013

astanway commented Aug 19, 2013

mabrek commented Aug 20, 2013

astanway commented Aug 20, 2013

mabrek commented Aug 20, 2013

astanway commented Sep 7, 2013