Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DNS cache logic and update default from 1s to 60s #1491

Merged
merged 4 commits into from
Nov 7, 2017

Conversation

dacjames
Copy link
Contributor

To my eyes, the dns caching logic was not fully implemented.

This converts to using rd_clock() and sets the rkb_t_rsal_last timestamp after the value is loaded.

I also adjust the default dns cache time to 60s. 1s is very low in practice; when you have a large number of clients, a low DNS caching time leads to a flood of DNS requests when a broker is restarted. Could split out the config change if necessary

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good, but the default change needs to go

@@ -596,17 +596,18 @@ rd_kafka_broker_send (rd_kafka_broker_t *rkb, rd_slice_t *slice) {
static int rd_kafka_broker_resolve (rd_kafka_broker_t *rkb) {
const char *errstr;

rd_ts_t now = rd_clock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for temp variable here, only used once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that’s your preference. As someone who recently spent a lot of time reading librdkafka from a place of relative ignorance, more temporary variables would have been very helpful.

@@ -284,7 +284,7 @@ static const struct rd_kafka_property rd_kafka_properties[] = {
_RK(broker_addr_ttl),
"How long to cache the broker address resolving "
"results (milliseconds).",
0, 86400*1000, 1*1000 },
0, 86400*1000, 60*1000 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS caching is performed by the resolver (typically libc) based on the records TTL setting, adding another layer of caching is inadvisable and will slow down the propagation of DNS record changes.

The initial default of broker.address.ttl was actually 5 minutes, but was decreased to 1s for this reason:
a0aff23
#494

The current default of 1s provides a bit of dampening, rather than caching.
A user that wants to dampen further can increase the configuration property.

@@ -624,6 +625,8 @@ static int rd_kafka_broker_resolve (rd_kafka_broker_t *rkb) {
"Failed to resolve '%s': %s",
rkb->rkb_nodename, errstr);
return -1;
} else {
rkb->rkb_t_rsal_last = rd_clock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use 8 whitespace tabs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to use tabs instead of spaces? The block above was inconsistent (the rd_kafka_broker_fail call is indented with spaces) so I wasn't sure what to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

librdkafka has mixed indents due to weird historical reasons, but all new code, and any changes to existing code, should use 8 whitespaces for tabs.

@dacjames
Copy link
Contributor Author

DNS caching is performed by the resolver (typically libc) based on the records TTL setting.

That’s not true. In general, libc does not perform DNS caching and major Linux distributions do not enable it by default.

I’ll remove it if you want but the current default is bad and will overload DNS servers during broker maintenance once you have a few hundred clients.

A more robust form of backoff when restablishing the broker connection may be a better solution to the DNS flooding problem, though. Even with a longer DNS cache, the current retry loop makes decommissioning a broker from a large, live system impossible.

@dacjames
Copy link
Contributor Author

@edenhill Applied changes per your review.

Tried to fix the whitespace but ending up confirming that it is already using tabs.

tabs

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small pr technicalities, then we're good to go!

@@ -624,6 +625,8 @@ static int rd_kafka_broker_resolve (rd_kafka_broker_t *rkb) {
"Failed to resolve '%s': %s",
rkb->rkb_nodename, errstr);
return -1;
} else {
rkb->rkb_t_rsal_last = rd_clock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

librdkafka has mixed indents due to weird historical reasons, but all new code, and any changes to existing code, should use 8 whitespaces for tabs.

@@ -284,7 +284,7 @@ static const struct rd_kafka_property rd_kafka_properties[] = {
_RK(broker_addr_ttl),
"How long to cache the broker address resolving "
"results (milliseconds).",
0, 86400*1000, 1*1000 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid unneccessary changes (for git blame to be convenient)

CONFIGURATION.md Outdated
@@ -27,7 +27,7 @@ socket.receive.buffer.bytes | * | 0 .. 100000000 | 0
socket.keepalive.enable | * | true, false | false | Enable TCP keep-alives (SO_KEEPALIVE) on broker sockets <br>*Type: boolean*
socket.nagle.disable | * | true, false | false | Disable the Nagle algorithm (TCP_NODELAY). <br>*Type: boolean*
socket.max.fails | * | 0 .. 1000000 | 3 | Disconnect from broker when this number of send failures (e.g., timed out requests) is reached. Disable with 0. NOTE: The connection is automatically re-established. <br>*Type: integer*
broker.address.ttl | * | 0 .. 86400000 | 1000 | How long to cache the broker address resolving results (milliseconds). <br>*Type: integer*
broker.address.ttl | * | 0 .. 86400000 | 60000 | How long to cache the broker address resolving results (milliseconds). <br>*Type: integer*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file was not re-generated after you changed the default back so it still contains the original change.
Remove from PR

@dacjames
Copy link
Contributor Author

dacjames commented Nov 7, 2017

@edenhill I think this resolves said technicalities.

@edenhill edenhill merged commit 2dae1f3 into confluentinc:master Nov 7, 2017
@edenhill
Copy link
Contributor

edenhill commented Nov 7, 2017

Big thanks for this!

@dacjames dacjames deleted the feature/dns-cache branch November 7, 2017 18:59
akhi3030 pushed a commit to akhi3030/librdkafka that referenced this pull request Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants