-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data retention enhancement #108
Data retention enhancement #108
Conversation
The excessive use of 'this.' is confusing, as it implies access to an object attribute/method that needs disambiguation.
The non-abbreviated field and method names are clearer.
These changes introduce two helper functions for writing batched and unmatched points separately. This helps by: - Removing the need to wrap 'unlatched' points in a 'BatchPoints' object - Locating the updates of the counters in a sensible location
These changes: - Add the ConsistencyLevel parameter to both the single and batched write methods. This was previously hidden from the user (defaulted to ConsistencyLevel.ONE). - Added a batched write method that takes a List<Point> parameter, rather than requiring a BatchPoints object. - Deprecated the BatchPoints object and both of the previous write() methods. - Provide implementations of the deprecated interface that use the new interface. - Updates the tests to use the new interface, though the old one will still work. - Fixes the BatchProcessor to use all of the common fields on the BatchEntry objects when performing the write (i.e. the current implementation creates the Map<String, BatchPoints> object only on the database name, but it should use all of the common fields). Justification 1. The current interface forces the user to create a BatchPoints object, even if they already have a list of Point objects that they want to send. 2. The BatchPoints object does not offer much convenience. The one feature that has not been reproduced is the ability to put the same tag on all points within a batch. I do not believe this is necessary because: - This tag information should be created and stored on the Point itself; and - It is pretty easy to add a function to add the same tag to a collection of points into the Point class.
Forgot to include the ConsistencyLevel.ONE parameter.
These changes add the configuration options and functionality to allow the internal BatchProcessor's data retention behaviour to be configured. The new configurable items are: - maxBatchWriteSize: The maximum number of points to attempt in one batch. - discardOnFailedWrite: If the BatchProcessor should throw away batched points if it fails to successfully write them (i.e. the current behaviour is to do this with buffered points). - BufferFailBehaviour: The behaviour the BatchProcessor should exhibit when adding to the queue fails (due to capacity problems, either implicit or explicit).
Fixing the public API to include the required enums and configuration methods. Cleaning up the attribute definitions in BatchProcessor.
Adding 'interrogation' functions to the InfluxDB interface, for investigating the state of the underlying buffer. NB: This now requires synchronisation, eek!
This library could really do with some logging.
… DataRetentionEnhancement
…ataRetentionEnhancement
These changes make the 'write()' call happen in the worker thread, rather than in the 'put()' thread, even when the 'actions' trigger is reached.
These changes introduce an exponential backoff for the flush based retry for when batched writes fail in the scheduled attempts. The backoff doubles the flushIntervalMin value, up until the newly added flushIntervalMax value. The other changes are to expose the new configuration items.
Consider making fork) |
I like your approach, but i need some time during the weekend to review it carefully. |
Hey, First, I think it is up to you. You are the primary maintainer, so in most ways it is your choice. I don't think I have enough of a 'different direction' to really warrant making a fork. If people really like my additions they can go to andrewdodd/influxdb-java, right? I have been thinking about this set of commits, especially with respect to the existing library. In some ways, my additions are expanding the scope of the API quite a lot, and designing for the general case they are addressing will be hard. (i.e. at least the existing library 'forces' its users to think about data retention explicitly). However, I have to admit, I'm not a big fan of the 'auto-batching' option in the current library, precisely because it silently drops the data it is claiming to buffer and batch. Regarding these commits, I had a few goes at getting to the design I wanted (as you can see from the commit log), so please look at the end state more than the intervening states. (I was trying to get too cute by picking the right underlying queue etc, but in the end I just resorted to the one queue and some helper functions). If you are interested in incorporating this stuff I can include some more comments/etc, I'm just a bit busy trying to deploy this with my work atm. Also, I'm not sure what is in the 2.0 / 2.1 release roadmap, so I guess you would have to decide. Maybe if other people also review the changes and think they are a good idea? Cheers, PS: I should include a comment on the addAndDropIfNecessary() method that it is a copy of the add() method from Guava's EvictingQueue. |
…to DataRetentionEnhancement Conflicts: src/main/java/org/influxdb/InfluxDB.java src/main/java/org/influxdb/impl/InfluxDBImpl.java src/test/java/org/influxdb/dto/PointTest.java
# Conflicts: # src/main/java/org/influxdb/impl/BatchProcessor.java # src/test/java/org/influxdb/InfluxDBTest.java # src/test/java/org/influxdb/PerformanceTests.java # src/test/java/org/influxdb/TicketTests.java # src/test/java/org/influxdb/dto/PointTest.java # src/test/java/org/influxdb/impl/BatchProcessorTest.java
final int flushIntervalMax, | ||
final TimeUnit flushIntervalTimeUnit) { | ||
|
||
enableBatch(0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Capacity should be > 0 or NULL", change it to null.
Despite this being very useful for me (and perhaps some other people), I think this 'auto-batching' feature is actually something that should be handled by the clients of this library. I think the 'batch' mode should just be another write option, and we should not magically batch in the background. I think this will avoid (if not solve) a number of issues, such as:
|
These are enhancements to allow data retention in the batch processor. Please see the commit for more info.