-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchProcessor can write points to the wrong retention policy. #162
Comments
I think I know about this issue. I am going to merge the #108 PR soon which should hopefully fix this issue (though it may introduce a number more). |
We are interested on how this goes. Now that I understand the problem a bit more, it is likely that we have data going into the abyss quite frequently. |
I could give the jar from the master branch a few cycles on my machine, but to really stress it in our main environment, I would need a release. v2.3
Yes, it is something the current BatchProcessor does on the sly. Would you be willing to: |
We have worked around this temporarily by creating an org.influxdb.InfluxDB object/connection for each measurement. This way influxDB cannot incorrectly choose a retention policy for a different measurement when building the batched write. Even though this is a workaround, it would be nice to not need a separate object/connection to the db for each measurement. |
@larrywalker @majst01 I also found another issue similar with this one and I will fix this issue after #239 due to I want to avoid too much code conflict. Thanks |
We are writing 2 events to 2 different tables at the same time each with their own retention policy (RP)
With batching enabled, if you are writing to measurements with different RP, one of the measurements can sometimes be written with the wrong RP. Then a query of the data with the correct RP makes it seem as though the measurement did not get written.
If stepping through the code, it slows down the writes enough that the events get put into different batches and therefore they get put in the correct measurements with the correct RP.
ONLY ONE EVENT STORED (EVENTS BATCHED INTO 1 RP)
[http] 2016/04/17 05:58:58 127.0.0.1 - root [17/Apr/2016:05:58:58 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2933 - okhttp/2.7.5 01037114-0483-11e6-910c-000000000000 11.485795ms
[wal] 2016/04/17 05:59:03 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 05:59:03 write to index of partition 1 took 6.381651ms
[http] 2016/04/17 05:59:03 127.0.0.1 - root [17/Apr/2016:05:59:03 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 041a23d6-0483-11e6-910d-000000000000 1.456205ms
[wal] 2016/04/17 05:59:09 Flush due to idle. Flushing 2 series with 2 points and 290 bytes from partition 1
[wal] 2016/04/17 05:59:09 write to index of partition 1 took 1.16645ms
BOTH EVENTS STORED CORRECT – DIFFERENT RP - Stepping through code
[wal] 2016/04/17 06:03:13 write to index of partition 1 took 6.276625ms
[wal] 2016/04/17 06:03:23 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:23 write to index of partition 1 took 6.882536ms
[http] 2016/04/17 06:03:29 127.0.0.1 - root [17/Apr/2016:06:03:29 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a2a56255-0483-11e6-9119-000000000000 1.397597ms
[http] 2016/04/17 06:03:31 127.0.0.1 - root [17/Apr/2016:06:03:31 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evttest_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a3b7eaaa-0483-11e6-911a-000000000000 1.512795ms
[wal] 2016/04/17 06:03:33 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:33 write to index of partition 1 took 7.765287ms
[wal] 2016/04/17 06:03:34 Flush due to idle. Flushing 1 series with 1 points and 69 bytes from partition 1
[wal] 2016/04/17 06:03:34 write to index of partition 1 took 1.112102ms
[query] 2016/04/17 06:03:35 SELECT * FROM dev.evttest_rp.evttest WHERE deveui = '00250C0100000300' AND time > now() - 1d
[http] 2016/04/17 06:03:35 127.0.0.1 - root [17/Apr/2016:06:03:35 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2949 - okhttp/2.7.5 a5f59547-0483-11e6-911b-000000000000 6.442473ms
[wal] 2016/04/17 06:03:36 Flush due to idle. Flushing 1 series with 1 points and 221 bytes from partition 1
[wal] 2016/04/17 06:03:36 write to index of partition 1 took 1.298728ms
If think this code needs only batch points with the same retention policy???
Map<String, BatchPoints> databaseToBatchPoints = Maps.newHashMap();
List batchEntries = new ArrayList<>(this.queue.size());
this.queue.drainTo(batchEntries);
for (BatchEntry batchEntry : batchEntries) {
String dbName = batchEntry.getDb();
if (!databaseToBatchPoints.containsKey(dbName)) {
BatchPoints batchPoints = BatchPoints.database(dbName).retentionPolicy(batchEntry.getRp()).build();
databaseToBatchPoints.put(dbName, batchPoints);
}
Point point = batchEntry.getPoint();
databaseToBatchPoints.get(dbName).point(point);
}
for (BatchPoints batchPoints : databaseToBatchPoints.values()) {
BatchProcessor.this.influxDB.write(batchPoints);
}
} catch (Throwable t) {
// any exception would stop the scheduler
logger.log(Level.SEVERE, "Batch could not be sent. Data will be lost", t);
}
The text was updated successfully, but these errors were encountered: