-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add container logs collector for containerd runtime #73
Conversation
add default .gitattributes file and git add --renormalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Thank you for taking time for this, I have added initial few review comments below. At the moment I don't quiet understand the use-case as to will it be opt-in vs always needs to be there so I have few basic build up questions as well.
@@ -49,6 +50,7 @@ func main() { | |||
collectors = append(collectors, dnsCollector) | |||
collectors = append(collectors, kubeObjectsCollector) | |||
collectors = append(collectors, networkOutboundCollector) | |||
collectors = append(collectors, containerDLogsCollector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Thanks for this, I have few questions to understand more about this work, what is the use-case with vanilla aks-periscope
tool run with this please? can you add an example in description:
Few questions:
So currently this tool runs on AKS Cluster, this work enables this critical
command with in collector to always collect containers
logs?
Will there be any scenario where user would not want this auto enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if you run the current aks-periscope on a public AKS cluster, it will NOT collect any container logs, as AKS has switched from docker to contianerd as default container runtime. I think this new collector should replace the existing "docker" container collecter.
As for crictl, so crictl
is the CLI tool to interact with containerd (precisely cri). Please read this article for details: https://www.tutorialworks.com/difference-docker-containerd-runc-crio-oci/
return err | ||
} | ||
|
||
output, err := utils.RunCommandOnHost("crictl", "ps", "-a", "-o", "json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/
I am wondering will this be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. We are not using crictl
to create anything, we are using it to retrieve container metadata and retrieve logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we want to remove the RunCommandOnHost
, could we have a look at https://github.com/containerd/containerd instead?
containerID := container.ID | ||
containerNameSpace, ok := container.Labels["io.kubernetes.pod.namespace"] | ||
if !ok { | ||
log.Printf("Unable to retrieve namespace of container: %+v", containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: consistency: see line 75 --> https://github.com/Azure/aks-periscope/pull/73/files#diff-a767a7e0a1997c77cf6883c7707d9559ab33c388e714275ffd439509335825f3R75 - message starts with upper case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm a bit confused by this comment? Should they both start with lowercase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt.Errorf
should start with a lowercase and use %w to wrap errors
log.Printf
should start with a capital letter and you can't use %w there. So you did good here, nothing to change :)
if strings.HasPrefix(containerName, containerLogParts[1]) && containerNameSpace == targetNameSpace { | ||
containerNames[containerID] = containerName | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 This is interesting, what is the this if-else
clause doing?
i.e. in both cases it achieves containerNames[containerID] = containerName
resultset?
if len(containerLogParts) == 2 {
if strings.HasPrefix(containerName, containerLogParts[1]) && containerNameSpace == targetNameSpace {
containerNames[containerID] = containerName
}
} else {
if containerNameSpace == targetNameSpace {
containerNames[containerID] = containerName
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the README file in the root, users can specify containers like this:
az aks kollect \
-g MyResourceGroup \
-n MyManagedCluster \
--container-logs "mynamespace1/mypod1 myns2"
So --container-logs
flag accepts two formats: "namespace/container" or just "namespace".
Here the logic is, if the length of containerLogParts
is 2, meaning the input format matches "namespace/container", we need to make sure both namespace and container name match. Otherwise, we only need to check if the namespace matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if you check the existing container collector, it's the same logic over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a duplicate of DIAGNOSTIC_CONTAINERLOGS_LIST
? We might want to remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--container-logs "mynamespace1/mypod1 myns2" is used with az aks kollect (the az cli command), under the hood this just sets the value of DIAGNOSTIC_CONTAINERLOGS_LIST before deploying
continue | ||
} | ||
|
||
containerName, ok := container.Labels["io.kubernetes.container.name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 How is this containerName
used? i.e. see the line below where code have !ok
the logic still continues does that means containerName
will always be there? but !ok
suggest otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an assumption that not all containers are managed by Kubernetes, so when we try to retrieve the container name using this label, the key "io.Kubernetes.container.name" may not exist. In this case, we just skip collecting logs of this container, thus we use continue
to skip this cycle of for
loop.
} | ||
|
||
for containerID, containerName := range containerNames { | ||
containerLog := filepath.Join(rootPath, targetNameSpace+"_"+containerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 targetNameSpace+"_"+containerName
what this leads to please? might be add a comment? also, can we string concatenate in other way possible as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, if you see the existing docker container collector, it's the same logic -- we just borrow the code from there. But sure, we can improve this.
targetNameSpace := containerLogParts[0] | ||
containerNames := map[string]string{} | ||
|
||
for _, container := range containers.ContainersList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime performance can be improved here. This code can potentially range over the entire set of node containers as many times as fields in DIAGNOSTIC_CONTAINERLOGS_LIST
. It should be able to achieve the same result in one pass by using a map.
Note I have also implemented a containerd containerLog collector here: My implementation is slightly different in that instead of using crictl I just directly read the relevant logfiles from /var/logs/containers. Your implementation is complementary in that it allows collecting logs from only those containers which are running, whereas mine will collect all containers that have ever run on the node (and that match one of the filters). Suggest this PR is merged first and then I will pull into my branch and add a simple collector-config flag which will allow selecting either "onlyRunning" or "all" behaviour, as I can see value in supporting both. |
I didn't look at the implementation but I like the idea better since we won't use any RunCmd and we won't have to rely on any tool installed on the host / container. Additionally, files can be streamed and that could be interesting for the future iterations |
if err != nil { | ||
return err | ||
} | ||
err = utils.WriteToFile(containerLog, output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to rebase this branch, we are not using WriteToFile
anymore. I'll approve once you rebased and changed this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't merge this PR @arnaud-tincelin. It has been superseded by the approach in my KIND PR.
Will be following David's approach instead. |
Azure Stack has switched its container runtime from docker to container-d. aks-periscope needs to support container log collection under the new runtime. This is the current error received when trying to save logs:
Error: Collector: containerlogs, collect data failed: Fail to run command on host: exit status 127
Therefore, I added a new container log collector to work with container-d. This collector works similarly to the existing container log collector but uses slightly different logic to avoid using docker commands.
It uses
crictl
command to retrieve a list of all containers, and filters to the ones with the targeted namespace (based on user input). Then, it retrieves the logs using the container ID and saves them to the storage account.This change has been tested on both Azure Stack and Public Azure and succeeds in collecting logs for both, under a folder called containerdlogs. This change is imperative since the runtime switch has already been made so there is no working container log collector as of now.