-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New component: AWS Cloudwatch Logs Receiver #14449
Comments
I'll sponsor this. |
For the config, I have some suggestions after looking deeper. Cloudwatch appears to have hierarchy when it comes to Taking that EKS example further, if I had two dev clusters:
And I only care about getting the log streams prefixed with This leads me to believe the more correct solution would be to have the array of log group configs specify the A config proposal I think is more appropriate would be: region: "" # required
profile: "" # optional to specify AWS creds
logs:
poll_interval: 1m0s
event_limit: 1000 # limit the number of events we process per poll
log_group_limit: 50. # limit the number of log groups we spin up goroutines for and limit autodiscovery
log_group_prefix: "" # if no log groups are specified, this can limit discovery of log groups to match a prefix
log_groups:
- name: "" # name of the log group, required if not using autodiscover
stream_prefix: "" # prefix to filter log streams of the log group to those that match
log_streams: [""] # manually specify the names of the streams desired to poll from
#### if both stream_prefix and log_streams are empty, collect all
### if both log_groups and log_group_prefix are empty, collect all groups and all subsequent streams until
### limits are reached. For that example I think a config like this is preferable as it can properly contextualize which streams are coming from which groups. receivers:
awscloudwatch:
region: us-east-1
logs:
log_groups:
- name: /aws/eks/dev-eks-0/cluster
log_streams: [kube-api-server]
- name: /aws/eks/dev-eks-1/cluster
# collect all streams Just making sure that this is the preferred approach, do either of you have insight @djaglowski @cpheps ? |
@schmikei I think your proposal makes sense. I think the only way to achieve the control you described with the proposed solution is to create multiple instances of the receiver which is not user friendly. I think it makes sense to change the config to feel more natural for an Cloudwatch user. |
I generally agree with organizing streams under groups. A couple related thoughts:
With most of these suggestions applied: region: "" # required
profile: "" # optional to specify AWS creds
logs:
poll_interval: 1m0s
event_limit: 1000
groups:
- name: group1
streams:
- name: foo1
- name: foo2
- prefix: group_foo_*
limit: 50
order_by: name_desc
streams:
- name: stream1
- prefix: stream_*
limit: 40
order_by: name_desc |
Agreed probably could simplify things 👍
Probably a good idea! Thinking an embedded struct that decodes consistently is probably the preference?
AFAIK the API gives you a window of start time and end time and the results are sent back with oldest entries first with no control on how to As far as attaching limits, it appears we have 3 things we want to limit
One thing is that prefixes are not regex based, it appears to be a fuzzy string match based off my experience. So we'll probably need to document that. As well as the API documents
As well as we would be exponentially increasing the number of requests per string if possible I'd like to keep number of requests made per stream to the minimum since the API has a somewhat opinionated way of using prefixes to grab multiple strings. Just to say I am a tad hesitant if this is a good thing to allow users to do, but will look further. Let's be sure to readdress this in the upcoming code review to see if we can come up with something that makes sense. Overall I think this config is probably cleaner so I'll start spiking out something closely coupled with what you suggest. |
Having a config property for start time might be useful? Sometimes you don't always want logs from the start of time and want it from say year 2020. |
This is a good point. @schmikei, correct me if I'm wrong, but I think the way this should work is that logs are collected for the window between each poll interval. Basically: func Start() {
// Initialize the end time from scratch
endTime := time.Now() - pollInterval
// Check if a checkpoint was previous stored
// (eg. the collector was down for some time and we need to catch up)
if end, ok := storage.Get("last_checkpoint"); ok {
endTime = end
}
}
func poll() {
startTime = endTime // from last poll interval
endTime = time.Now()
} To @lewis262626's point, it would be useful to have the ability to pull logs from earlier time windows. I would consider this an alternate mode of operation though, something that could be added later. An optional config section could allow the user to specify the historical time window that should be pulled, and possible some settings to control how aggressively the logs are pulled. |
Yes I believe something like this could work something like this could work logically, however I have some reservations. One thing I'm struggling with is how a parameter like this would look in the config? Would it be preferred to have some kind of awscloudwatch:
polling_interval: 5m
region: us-west-1
logs:
start_time: "2022-10-07T16:50:52.664-0400"
This doesn't seem super user friendly, we could provide a I also am looking into your suggestions here and I feel like we could expose the endpoint as a config parameter! Will be looking into adding this |
@schmikei, what I'm suggesting primarily is that this should be a future enhancement. I think the appropriate next step for that enhancement would be for someone to open an issue proposing a specific config format and any other relevant details about the capability. |
The purpose and use-cases of the new component
Collect Log Events from AWS Cloudwatch and transpose them into OTel pdata Logs. A use case for this is for users with multi-cloud environments they can export logs from AWS into their desired log platform.
Example configuration for the component
The configuration has a
logs
sub config as cloudwatch also supports metrics. This leaves the config flexible to add metric collection later if desired. Cloudwatch metric scraping is currently feature request in #6226.Also, just including a link on AWS Profiles.
Telemetry data types supported
logs
Is this a vendor-specific component?
Sponsor (optional)
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: