-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document reverse_dns
remap function and add local caching
#4517
Comments
A closely related request mentioned on discord: doing a |
Noting, this issue is currently blocked due to decisions around where to draw the line with Remap functions (covered in #3740). See #5647 (comment). We are hesitant to add functions that issue network calls on a per-event basis since this could severely hinder performance. |
@binarylogic I briefly looked into #5647. It would be fairly trivial to add "runtime state" to VRL. Basically, we'd initialize the runtime at boot, and then allow state to be tracked by individual functions across executions. This would allow us to cache the DNS lookup calls (for the DNS TTL duration), which would significantly reduce the overhead. We can defer this until after launch, but it seems that would mostly resolve the downside of allowing network calls in functions, at least in this specific case. |
Is there any news regarding this kind of functionality? Secondly, would the current implementation allow you to specify the DNS server, or does it just take the one configured OS wide? |
Hi @jeromekleinen-kbc . Nothing yet, but it's still on our radar. This one is a bit more complex than other VRL functions given that it would require caching to provide acceptable throughput.
My instinct would just be to use the OS configured resolver, but would you find it useful to specify a set of DNS servers to use? We could make that an optional flag. |
@jszwedko my intent of specifying the DNS server is to set up a local DNS cache to avoid overloading the corporate DNS servers. I guess when there is native caching in vector this becomes less important. Perhaps one use case could be that depending on the hostname you might want to pick a different DNS server, either because of the domain or because of some zonality to reduce latency. Native caching would be cool but I guess it comes with its own challenges, f.e. do you just follow the TTL or do you have let the cache expire with different timings, maximum size of the cache etc. |
I agree, it makes sense to allow specifying optional custom DNS servers. We might not add it in the first iteration, but it wouldn't be fairly trivial to add as a follow-up.
The thinking right now would be:
|
It would be cool if both the max TTL and the maximum cache size could be configurable, but I understand that this would add two additional global options. Just for reference, this is what logstash offers: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dns.html |
Requested in discord: https://discord.com/channels/742820443487993987/746070591097798688/870424592047550524 |
Thanks @ypid-geberit ! I'll leave this open until we publish it in the documentation (likely after we add caching). |
reverse_dns
remap functionreverse_dns
remap function w/ local caching
reverse_dns
remap function w/ local cachingreverse_dns
remap function and add local caching
DNS Caching would be very useful for me and greatly improve performance. |
Which version of DNS cache can be added? |
No timeline yet, unfortunately. |
I suggested something similar (not having seen this ticket) in vectordotdev/vrl#720 recently but would be happy to also see a local cache for performance gain. There needs to be the ability to more granularly specify lookups that are performed, which is why I wrote my feature request with a fairly long list of standard DNS options that might be supported. Just doing reverse lookups is not good enough, I think - there is no need to limit lookups to PTR records, and I see that as kind of a strange thing to implement as a standalone function in the first place. I also believe the result back from a DNS lookup (even if cached) should be structured in a DNSTAP-style message so that it can be parsed the most flexible way possible for all QTYPEs, not just PTRs. If the function is written in such a way that it allows specification of things like timeouts and retries, this can be done in a way that is optimally performant without high risk of blocking, especially if VRL client stub timeout values can be changed from the standard "seconds" to "milliseconds". The local cache will still get a copy of the request and store it even if the first Vector event has moved on after a timeout. The next request after the forwarding DNS server is completed with its request will get a very rapid reply and will keep it until the TTL expires. So one or two requests may suffer, but after that point things will accelerate greatly (less than 0.3ms even across ~5-meter distant adjacent, 1-switch separated fairly busy physical servers for most answers in our case using UDP, and faster still if the caching resolver is on localhost - possibly an order of magnitude but I haven't tested.) While I appreciate Vector not allowing "footguns", it also is the case that without this particular footgun I will have to write an entirely separate downstream pipeline for many of the objects going through Vector, which sort of invalidates much of the savings I am getting from consolidating our event event stream into processing by one codebase at the edge of our widely-dispersed network. In other words: I must have DNS lookups done on specific sets of objects before sending them along; it's not just a "nice to have." Lua can do this (I think?) with os.execute and I suppose I need to start looking at that, but I suspect/assert it's possible to do this much faster from within a native function. |
for sure caching would greatly help |
Reverse DNS lookup for an IP address.
Example
Result:
The text was updated successfully, but these errors were encountered: