Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data retention enhancement #108

Closed
wants to merge 21 commits into from
Closed

Data retention enhancement #108

wants to merge 21 commits into from

Conversation

andrewdodd
Copy link
Contributor

These are enhancements to allow data retention in the batch processor. Please see the commit for more info.

andrewdodd and others added 15 commits October 21, 2015 10:31
The excessive use of 'this.' is confusing, as it implies access to an
object attribute/method that needs disambiguation.
The non-abbreviated field and method names are clearer.
These changes introduce two helper functions for writing batched and
unmatched points separately. This helps by:
 - Removing the need to wrap 'unlatched' points in a 'BatchPoints' object
 - Locating the updates of the counters in a sensible location
These changes:
 - Add the ConsistencyLevel parameter to both the single and batched
   write methods. This was previously hidden from the user (defaulted to
   ConsistencyLevel.ONE).
 - Added a batched write method that takes a List<Point> parameter,
   rather than requiring a BatchPoints object.
 - Deprecated the BatchPoints object and both of the previous write()
   methods.
 - Provide implementations of the deprecated interface that use the new
   interface.
 - Updates the tests to use the new interface, though the old one will
   still work.
 - Fixes the BatchProcessor to use all of the common fields on the
   BatchEntry objects when performing the write (i.e. the current
   implementation creates the Map<String, BatchPoints> object only on
   the database name, but it should use all of the common fields).

Justification

1. The current interface forces the user to create a BatchPoints object,
   even if they already have a list of Point objects that they want to
   send.

2. The BatchPoints object does not offer much convenience. The one
   feature that has not been reproduced is the ability to put the same
   tag on all points within a batch. I do not believe this is necessary
   because:
     - This tag information should be created and stored on the Point
       itself; and
     - It is pretty easy to add a function to add the same tag to a
       collection of points into the Point class.
Forgot to include the ConsistencyLevel.ONE parameter.
These changes add the configuration options and functionality to allow
the internal BatchProcessor's data retention behaviour to be configured.

The new configurable items are:

 - maxBatchWriteSize:
    The maximum number of points to attempt in one batch.

 - discardOnFailedWrite:
    If the BatchProcessor should throw away batched points if it fails
    to successfully write them (i.e. the current behaviour is to do this
    with buffered points).

 - BufferFailBehaviour:
    The behaviour the BatchProcessor should exhibit when adding to the
    queue fails (due to capacity problems, either implicit or explicit).
Fixing the public API to include the required enums and configuration methods.
Cleaning up the attribute definitions in BatchProcessor.
Adding 'interrogation' functions to the InfluxDB interface, for investigating
the state of the underlying buffer. NB: This now requires synchronisation, eek!
This library could really do with some logging.
These changes make the 'write()' call happen in the worker thread, rather than
in the 'put()' thread, even when the 'actions' trigger is reached.
These changes introduce an exponential backoff for the flush based retry for
when batched writes fail in the scheduled attempts.

The backoff doubles the flushIntervalMin value, up until the newly added
flushIntervalMax value.

The other changes are to expose the new configuration items.
@Kindrat
Copy link

Kindrat commented Oct 23, 2015

Consider making fork)

@majst01
Copy link
Collaborator

majst01 commented Oct 23, 2015

I like your approach, but i need some time during the weekend to review it carefully.
Do you think it is worth to pull it now or create make a 2.1 version instead ?

@andrewdodd
Copy link
Contributor Author

Hey,

First, I think it is up to you. You are the primary maintainer, so in most ways it is your choice. I don't think I have enough of a 'different direction' to really warrant making a fork. If people really like my additions they can go to andrewdodd/influxdb-java, right?

I have been thinking about this set of commits, especially with respect to the existing library. In some ways, my additions are expanding the scope of the API quite a lot, and designing for the general case they are addressing will be hard. (i.e. at least the existing library 'forces' its users to think about data retention explicitly). However, I have to admit, I'm not a big fan of the 'auto-batching' option in the current library, precisely because it silently drops the data it is claiming to buffer and batch.

Regarding these commits, I had a few goes at getting to the design I wanted (as you can see from the commit log), so please look at the end state more than the intervening states. (I was trying to get too cute by picking the right underlying queue etc, but in the end I just resorted to the one queue and some helper functions).

If you are interested in incorporating this stuff I can include some more comments/etc, I'm just a bit busy trying to deploy this with my work atm.

Also, I'm not sure what is in the 2.0 / 2.1 release roadmap, so I guess you would have to decide. Maybe if other people also review the changes and think they are a good idea?

Cheers,
Andrew

PS: I should include a comment on the addAndDropIfNecessary() method that it is a copy of the add() method from Guava's EvictingQueue.

@andrewdodd andrewdodd closed this Dec 14, 2015
@andrewdodd andrewdodd reopened this Jan 17, 2016
…to DataRetentionEnhancement

Conflicts:
	src/main/java/org/influxdb/InfluxDB.java
	src/main/java/org/influxdb/impl/InfluxDBImpl.java
	src/test/java/org/influxdb/dto/PointTest.java
# Conflicts:
#	src/main/java/org/influxdb/impl/BatchProcessor.java
#	src/test/java/org/influxdb/InfluxDBTest.java
#	src/test/java/org/influxdb/PerformanceTests.java
#	src/test/java/org/influxdb/TicketTests.java
#	src/test/java/org/influxdb/dto/PointTest.java
#	src/test/java/org/influxdb/impl/BatchProcessorTest.java
final int flushIntervalMax,
final TimeUnit flushIntervalTimeUnit) {

enableBatch(0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Capacity should be > 0 or NULL", change it to null.

@andrewdodd
Copy link
Contributor Author

Despite this being very useful for me (and perhaps some other people), I think this 'auto-batching' feature is actually something that should be handled by the clients of this library.

I think the 'batch' mode should just be another write option, and we should not magically batch in the background. I think this will avoid (if not solve) a number of issues, such as:

  • what to do on failure
  • how to support async

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants