Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add container logs collector for containerd runtime #73

Closed
wants to merge 2 commits into from

Conversation

shwetamurali
Copy link
Contributor

Azure Stack has switched its container runtime from docker to container-d. aks-periscope needs to support container log collection under the new runtime. This is the current error received when trying to save logs:
Error: Collector: containerlogs, collect data failed: Fail to run command on host: exit status 127

Therefore, I added a new container log collector to work with container-d. This collector works similarly to the existing container log collector but uses slightly different logic to avoid using docker commands.

It uses crictl command to retrieve a list of all containers, and filters to the ones with the targeted namespace (based on user input). Then, it retrieves the logs using the container ID and saves them to the storage account.

This change has been tested on both Azure Stack and Public Azure and succeeds in collecting logs for both, under a folder called containerdlogs. This change is imperative since the runtime switch has already been made so there is no working container log collector as of now.

@Tatsinnit Tatsinnit requested review from arnaud-tincelin, Tatsinnit, davidkydd and JunSun17 and removed request for arnaud-tincelin June 24, 2021 02:35
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you for taking time for this, I have added initial few review comments below. At the moment I don't quiet understand the use-case as to will it be opt-in vs always needs to be there so I have few basic build up questions as well.

@@ -49,6 +50,7 @@ func main() {
collectors = append(collectors, dnsCollector)
collectors = append(collectors, kubeObjectsCollector)
collectors = append(collectors, networkOutboundCollector)
collectors = append(collectors, containerDLogsCollector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thanks for this, I have few questions to understand more about this work, what is the use-case with vanilla aks-periscope tool run with this please? can you add an example in description:

Few questions:

So currently this tool runs on AKS Cluster, this work enables this critical command with in collector to always collect containers logs?

Will there be any scenario where user would not want this auto enabled?

Copy link

@songjiaxun songjiaxun Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you run the current aks-periscope on a public AKS cluster, it will NOT collect any container logs, as AKS has switched from docker to contianerd as default container runtime. I think this new collector should replace the existing "docker" container collecter.
As for crictl, so crictl is the CLI tool to interact with containerd (precisely cri). Please read this article for details: https://www.tutorialworks.com/difference-docker-containerd-runc-crio-oci/

return err
}

output, err := utils.RunCommandOnHost("crictl", "ps", "-a", "-o", "json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/

I am wondering will this be an issue?

Screen Shot 2021-06-24 at 2 05 12 PM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. We are not using crictl to create anything, we are using it to retrieve container metadata and retrieve logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we want to remove the RunCommandOnHost, could we have a look at https://github.com/containerd/containerd instead?

containerID := container.ID
containerNameSpace, ok := container.Labels["io.kubernetes.pod.namespace"]
if !ok {
log.Printf("Unable to retrieve namespace of container: %+v", containerID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit confused by this comment? Should they both start with lowercase?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Errorf should start with a lowercase and use %w to wrap errors
log.Printf should start with a capital letter and you can't use %w there. So you did good here, nothing to change :)

pkg/collector/containerdlogs_collector.go Show resolved Hide resolved
pkg/collector/containerdlogs_collector.go Show resolved Hide resolved
if strings.HasPrefix(containerName, containerLogParts[1]) && containerNameSpace == targetNameSpace {
containerNames[containerID] = containerName
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 This is interesting, what is the this if-else clause doing?

i.e. in both cases it achieves containerNames[containerID] = containerName resultset?

		if len(containerLogParts) == 2 {
			if strings.HasPrefix(containerName, containerLogParts[1]) && containerNameSpace == targetNameSpace {
				containerNames[containerID] = containerName
			}
		} else {
			if containerNameSpace == targetNameSpace {
				containerNames[containerID] = containerName
			}
		}

Copy link

@songjiaxun songjiaxun Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the README file in the root, users can specify containers like this:

az aks kollect \
-g MyResourceGroup \
-n MyManagedCluster \
--container-logs "mynamespace1/mypod1 myns2"

So --container-logs flag accepts two formats: "namespace/container" or just "namespace".
Here the logic is, if the length of containerLogParts is 2, meaning the input format matches "namespace/container", we need to make sure both namespace and container name match. Otherwise, we only need to check if the namespace matches.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if you check the existing container collector, it's the same logic over there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a duplicate of DIAGNOSTIC_CONTAINERLOGS_LIST? We might want to remove this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--container-logs "mynamespace1/mypod1 myns2" is used with az aks kollect (the az cli command), under the hood this just sets the value of DIAGNOSTIC_CONTAINERLOGS_LIST before deploying

continue
}

containerName, ok := container.Labels["io.kubernetes.container.name"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 How is this containerName used? i.e. see the line below where code have !ok the logic still continues does that means containerName will always be there? but !ok suggest otherwise.

Copy link

@songjiaxun songjiaxun Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an assumption that not all containers are managed by Kubernetes, so when we try to retrieve the container name using this label, the key "io.Kubernetes.container.name" may not exist. In this case, we just skip collecting logs of this container, thus we use continue to skip this cycle of for loop.

}

for containerID, containerName := range containerNames {
containerLog := filepath.Join(rootPath, targetNameSpace+"_"+containerName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 targetNameSpace+"_"+containerName what this leads to please? might be add a comment? also, can we string concatenate in other way possible as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, if you see the existing docker container collector, it's the same logic -- we just borrow the code from there. But sure, we can improve this.

@Tatsinnit Tatsinnit linked an issue Jun 24, 2021 that may be closed by this pull request
@shwetamurali shwetamurali changed the title Add container logs collector for container-d runtime Add container logs collector for containerd runtime Jun 24, 2021
pkg/collector/collector.go Show resolved Hide resolved
targetNameSpace := containerLogParts[0]
containerNames := map[string]string{}

for _, container := range containers.ContainersList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime performance can be improved here. This code can potentially range over the entire set of node containers as many times as fields in DIAGNOSTIC_CONTAINERLOGS_LIST. It should be able to achieve the same result in one pass by using a map.

pkg/collector/containerdlogs_collector.go Show resolved Hide resolved
pkg/collector/containerdlogs_collector.go Show resolved Hide resolved
@davidkydd
Copy link
Collaborator

Note I have also implemented a containerd containerLog collector here:
https://github.com/Azure/aks-periscope/pull/69/files#diff-7520e2334630376456ed95388019d5b5a7abcd6f77f2003a2c68bb7afab5e237

My implementation is slightly different in that instead of using crictl I just directly read the relevant logfiles from /var/logs/containers. Your implementation is complementary in that it allows collecting logs from only those containers which are running, whereas mine will collect all containers that have ever run on the node (and that match one of the filters).

Suggest this PR is merged first and then I will pull into my branch and add a simple collector-config flag which will allow selecting either "onlyRunning" or "all" behaviour, as I can see value in supporting both.

@arnaud-tincelin
Copy link
Collaborator

Note I have also implemented a containerd containerLog collector here:
https://github.com/Azure/aks-periscope/pull/69/files#diff-7520e2334630376456ed95388019d5b5a7abcd6f77f2003a2c68bb7afab5e237

My implementation is slightly different in that instead of using crictl I just directly read the relevant logfiles from /var/logs/containers. Your implementation is complementary in that it allows collecting logs from only those containers which are running, whereas mine will collect all containers that have ever run on the node (and that match one of the filters).

Suggest this PR is merged first and then I will pull into my branch and add a simple collector-config flag which will allow selecting either "onlyRunning" or "all" behaviour, as I can see value in supporting both.

I didn't look at the implementation but I like the idea better since we won't use any RunCmd and we won't have to rely on any tool installed on the host / container. Additionally, files can be streamed and that could be interesting for the future iterations

if err != nil {
return err
}
err = utils.WriteToFile(containerLog, output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to rebase this branch, we are not using WriteToFile anymore. I'll approve once you rebased and changed this code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge this PR @arnaud-tincelin. It has been superseded by the approach in my KIND PR.

@Tatsinnit Tatsinnit removed the do-not merge ❌🔀 Avoid merging for now label Jul 7, 2021
@shwetamurali
Copy link
Contributor Author

Will be following David's approach instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new collector for logs with container-d runtime
6 participants