BatchProcessor can write points to the wrong retention policy. #162

larrywalker · 2016-04-17T21:18:26Z

We are writing 2 events to 2 different tables at the same time each with their own retention policy (RP)

With batching enabled, if you are writing to measurements with different RP, one of the measurements can sometimes be written with the wrong RP. Then a query of the data with the correct RP makes it seem as though the measurement did not get written.

If stepping through the code, it slows down the writes enough that the events get put into different batches and therefore they get put in the correct measurements with the correct RP.

ONLY ONE EVENT STORED (EVENTS BATCHED INTO 1 RP)
[http] 2016/04/17 05:58:58 127.0.0.1 - root [17/Apr/2016:05:58:58 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2933 - okhttp/2.7.5 01037114-0483-11e6-910c-000000000000 11.485795ms
[wal] 2016/04/17 05:59:03 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 05:59:03 write to index of partition 1 took 6.381651ms
[http] 2016/04/17 05:59:03 127.0.0.1 - root [17/Apr/2016:05:59:03 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 041a23d6-0483-11e6-910d-000000000000 1.456205ms
[wal] 2016/04/17 05:59:09 Flush due to idle. Flushing 2 series with 2 points and 290 bytes from partition 1
[wal] 2016/04/17 05:59:09 write to index of partition 1 took 1.16645ms

BOTH EVENTS STORED CORRECT – DIFFERENT RP - Stepping through code
[wal] 2016/04/17 06:03:13 write to index of partition 1 took 6.276625ms
[wal] 2016/04/17 06:03:23 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:23 write to index of partition 1 took 6.882536ms
[http] 2016/04/17 06:03:29 127.0.0.1 - root [17/Apr/2016:06:03:29 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a2a56255-0483-11e6-9119-000000000000 1.397597ms
[http] 2016/04/17 06:03:31 127.0.0.1 - root [17/Apr/2016:06:03:31 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evttest_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a3b7eaaa-0483-11e6-911a-000000000000 1.512795ms
[wal] 2016/04/17 06:03:33 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:33 write to index of partition 1 took 7.765287ms
[wal] 2016/04/17 06:03:34 Flush due to idle. Flushing 1 series with 1 points and 69 bytes from partition 1
[wal] 2016/04/17 06:03:34 write to index of partition 1 took 1.112102ms
[query] 2016/04/17 06:03:35 SELECT * FROM dev.evttest_rp.evttest WHERE deveui = '00250C0100000300' AND time > now() - 1d
[http] 2016/04/17 06:03:35 127.0.0.1 - root [17/Apr/2016:06:03:35 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2949 - okhttp/2.7.5 a5f59547-0483-11e6-911b-000000000000 6.442473ms
[wal] 2016/04/17 06:03:36 Flush due to idle. Flushing 1 series with 1 points and 221 bytes from partition 1
[wal] 2016/04/17 06:03:36 write to index of partition 1 took 1.298728ms

If think this code needs only batch points with the same retention policy???
Map<String, BatchPoints> databaseToBatchPoints = Maps.newHashMap();
List batchEntries = new ArrayList<>(this.queue.size());
this.queue.drainTo(batchEntries);

for (BatchEntry batchEntry : batchEntries) {
String dbName = batchEntry.getDb();
if (!databaseToBatchPoints.containsKey(dbName)) {
BatchPoints batchPoints = BatchPoints.database(dbName).retentionPolicy(batchEntry.getRp()).build();
databaseToBatchPoints.put(dbName, batchPoints);
}
Point point = batchEntry.getPoint();
databaseToBatchPoints.get(dbName).point(point);
}

for (BatchPoints batchPoints : databaseToBatchPoints.values()) {
BatchProcessor.this.influxDB.write(batchPoints);
}
} catch (Throwable t) {
// any exception would stop the scheduler
logger.log(Level.SEVERE, "Batch could not be sent. Data will be lost", t);
}

andrewdodd · 2016-04-18T11:45:37Z

I think I know about this issue. I am going to merge the #108 PR soon which should hopefully fix this issue (though it may introduce a number more).

larrywalker · 2016-04-18T15:56:41Z

We are interested on how this goes. Now that I understand the problem a bit more, it is likely that we have data going into the abyss quite frequently.

andrewdodd · 2016-04-18T16:09:19Z

Yes, sending data to the abyss is something the current BatchProcessor does on the sly. Would you be willing to:
a) Test the changes from the #108 PR branch?
b) Test the master branch soon after I merge the #108 PR into the master branch?

larrywalker · 2016-04-22T15:16:06Z

I could give the jar from the master branch a few cycles on my machine, but to really stress it in our main environment, I would need a release. v2.3

On Monday, April 18, 2016 12:09 PM, Andrew Dodd <[email protected]> wrote:

Yes, it is something the current BatchProcessor does on the sly. Would you be willing to:
a) Test the changes from the #108 PR branch?
b) Test the master branch soon after I merge the #108 PR into the master branch?—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

larrywalker · 2016-07-08T15:52:02Z

We have worked around this temporarily by creating an org.influxdb.InfluxDB object/connection for each measurement. This way influxDB cannot incorrectly choose a retention policy for a different measurement when building the batched write.

Even though this is a workaround, it would be nice to not need a separate object/connection to the db for each measurement.

jiafu1115 · 2016-11-10T23:26:42Z

@larrywalker @majst01 I also found another issue similar with this one and I will fix this issue after #239 due to I want to avoid too much code conflict. Thanks

jiafu1115 mentioned this issue Nov 14, 2016

support async write point with different rp. #243

Merged

majst01 closed this as completed in 16e79a9 Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchProcessor can write points to the wrong retention policy. #162

BatchProcessor can write points to the wrong retention policy. #162

larrywalker commented Apr 17, 2016

andrewdodd commented Apr 18, 2016

larrywalker commented Apr 18, 2016

andrewdodd commented Apr 18, 2016 •

edited

Loading

larrywalker commented Apr 22, 2016

larrywalker commented Jul 8, 2016 •

edited

Loading

jiafu1115 commented Nov 10, 2016 •

edited

Loading

BatchProcessor can write points to the wrong retention policy. #162

BatchProcessor can write points to the wrong retention policy. #162

Comments

larrywalker commented Apr 17, 2016

andrewdodd commented Apr 18, 2016

larrywalker commented Apr 18, 2016

andrewdodd commented Apr 18, 2016 • edited Loading

larrywalker commented Apr 22, 2016

larrywalker commented Jul 8, 2016 • edited Loading

jiafu1115 commented Nov 10, 2016 • edited Loading

andrewdodd commented Apr 18, 2016 •

edited

Loading

larrywalker commented Jul 8, 2016 •

edited

Loading

jiafu1115 commented Nov 10, 2016 •

edited

Loading