Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dragonfly v0.2.0 does not work well with dfdaemon local proxy #3744

Open
kelein opened this issue Jan 3, 2025 · 13 comments
Open

dragonfly v0.2.0 does not work well with dfdaemon local proxy #3744

kelein opened this issue Jan 3, 2025 · 13 comments
Assignees
Labels

Comments

@kelein
Copy link

kelein commented Jan 3, 2025

Bug report:

Dragonfly v0.2.0 does not work well expectedly when I deployed it by helm chart dragonfly-1.2.28.

  1. When first pull image the dragonfly-client has no log output (/var/log/dragonfly/dfdaemon/dfdaemon.log)
2025-01-03T10:34:28.559341648+00:00  INFO run:announce_host: dragonfly-client/src/grpc/scheduler.rs:180: announce host to 10.155.68.235:8002 request=AnnounceHostRequest { host: Some(Host { id: "10.155.71.34-ip-10-155-71-34", r#type: 0, hostname: "ip-10-155-71-34", ip: "10.155.71.34", port: 4000, download_port: 4000, os: "linux", platform: "linux", platform_family: "unix", platform_version: "12", kernel_version: "5.10.228-219.884.amzn2.x86_64", cpu: Some(Cpu { logical_count: 48, physical_count: 48, percent: 0.31253084540367126, process_percent: 0.07668274641036987, times: None }), memory: Some(Memory { total: 1204529893376, available: 899274952704, used: 305254940672, used_percent: 0.0, process_used_percent: 0.0, free: 350690447360 }), network: Some(Network { tcp_connection_count: 0, upload_tcp_connection_count: 0, location: Some(""), idc: Some(""), download_rate: 0, download_rate_limit: 10737418240, upload_rate: 0, upload_rate_limit: 10737418240 }), disk: Some(Disk { total: 7517847420928, free: 7465397878784, used: 52449542144, used_percent: 0.6976670209878462, inodes_total: 0, inodes_used: 0, inodes_free: 0, inodes_used_percent: 0.0, read_bandwidth: 0, write_bandwidth: 54 }), build: Some(Build { git_version: "0.2.0", git_commit: Some("unknown"), go_version: None, rust_version: Some(""), platform: None }), scheduler_cluster_id: 0, disable_shared: false }), interval: Some(Duration { seconds: 300, nanos: 0 }) }
2025-01-03T10:34:28.559406752+00:00  INFO run:announce_host: dragonfly-client/src/grpc/scheduler.rs:180: announce host to 10.155.67.179:8002 request=AnnounceHostRequest { host: Some(Host { id: "10.155.71.34-ip-10-155-71-34", r#type: 0, hostname: "ip-10-155-71-34", ip: "10.155.71.34", port: 4000, download_port: 4000, os: "linux", platform: "linux", platform_family: "unix", platform_version: "12", kernel_version: "5.10.228-219.884.amzn2.x86_64", cpu: Some(Cpu { logical_count: 48, physical_count: 48, percent: 0.31253084540367126, process_percent: 0.07668274641036987, times: None }), memory: Some(Memory { total: 1204529893376, available: 899274952704, used: 305254940672, used_percent: 0.0, process_used_percent: 0.0, free: 350690447360 }), network: Some(Network { tcp_connection_count: 0, upload_tcp_connection_count: 0, location: Some(""), idc: Some(""), download_rate: 0, download_rate_limit: 10737418240, upload_rate: 0, upload_rate_limit: 10737418240 }), disk: Some(Disk { total: 7517847420928, free: 7465397878784, used: 52449542144, used_percent: 0.6976670209878462, inodes_total: 0, inodes_used: 0, inodes_free: 0, inodes_used_percent: 0.0, read_bandwidth: 0, write_bandwidth: 54 }), build: Some(Build { git_version: "0.2.0", git_commit: Some("unknown"), go_version: None, rust_version: Some(""), platform: None }), scheduler_cluster_id: 0, disable_shared: false }), interval: Some(Duration { seconds: 300, nanos: 0 }) }
  1. The client cache directory has no data cached
36K	/data/dragonfly/storage

Expected behavior:

  • Image pull works well with local dfdaemon proxy
  • Dragonfly client cache directory store the task data

How to reproduce it:

helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
helm upgrade --install --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f values.yaml

NAME     	NAMESPACE       	REVISION	UPDATED                             	STATUS  	CHART           	APP VERSION
dragonfly	dragonfly-system	4       	2025-01-03 15:03:20.551231 +0800 CST	deployed	dragonfly-1.2.28	2.1.65
  • values.yaml
client:
  enable: true
  name: client
  nameOverride: ""
  fullnameOverride: ""
  maxProcs: ""
  image:
    registry: docker.io
    repository: dragonflyoss/client
    tag: v0.2.0
    digest: ""
    pullPolicy: IfNotPresent
    pullSecrets: []
  hostAliases: []
  hostPID: true
  hostIPC: true
  hostNetwork: true
  resources:
    requests:
      cpu: "0"
      memory: "0"
    limits:
      cpu: "2"
      memory: "4Gi"
  priorityClassName: ""
  nodeSelector: {}
  terminationGracePeriodSeconds:
  tolerations:
    - operator: "Exists"
  podAnnotations: {}
  podLabels: {}
  updateStrategy: {}
  statefulsetAnnotations: {}
  initContainer:
    resources:
      requests:
        cpu: "0"
        memory: "0"
      limits:
        cpu: "2"
        memory: "4Gi"
    image:
      registry: docker.io
      repository: busybox
      tag: latest
      digest: ""
      pullPolicy: IfNotPresent
  extraVolumes:
    - name: storage
      hostPath:
        path: /data/dragonfly/storage/
        type: DirectoryOrCreate
    - name: logs
      emptyDir: {}
  extraVolumeMounts:
    - name: storage
      mountPath: /data/dragonfly/storage/
    - name: logs
      mountPath: /var/log/dragonfly/dfdaemon/
  dfinit:
    enable: true
    image:
      registry: docker.io
      repository: dragonflyoss/dfinit
      tag: v0.2.0
      digest: ""
      pullPolicy: IfNotPresent
    config:
      verbose: true
      log:
        level: info
      proxy:
        addr: http://127.0.0.1:4001
      containerRuntime:
        containerd:
          configPath: /etc/containerd/config.toml
          registries:
            - hostNamespace: docker.io
              serverAddr: https://index.docker.io
              capabilities: ["pull", "resolve"]
              skipVerify: true
            - hostNamespace: ghcr.io
              serverAddr: https://ghcr.io
              capabilities: ["pull", "resolve"]
              skipVerify: true
            - hostNamespace: artifact
              serverAddr: https://artifactory.com
              capabilities: ["pull", "resolve"]
              skipVerify: true
            - hostNamespace: artifact-dev
              serverAddr: https://artifactory-dev.com
              capabilities: ["pull", "resolve"]
              skipVerify: true
  config:
    verbose: true
    log:
      level: info
    host:
      idc: ""
      location: ""
    server:
      pluginDir: /data/dragonfly/dfdaemon/plugins/
      cacheDir: /data/dragonfly/dfdaemon/cache/
    download:
      server:
        socketPath: /var/run/dragonfly/dfdaemon.sock
      rateLimit: 10GiB
      pieceTimeout: 30s
      concurrentPieceCount: 16
    upload:
      server:
        port: 4000
      disableShared: false
      rateLimit: 10GiB
    manager:
      addr: ""
    scheduler:
      announceInterval: 5m
      scheduleTimeout: 30s
      maxScheduleCount: 5
      enableBackToSource: true
    dynconfig:
      refreshInterval: 5m
    storage:
      dir: /data/dragonfly/storage/
      keep: false
      writeBufferSize: 131072
      readBufferSize: 131072
    gc:
      interval: 900s
      policy:
        taskTTL: 168h
        distHighThresholdPercent: 90
        distLowThresholdPercent: 70
    proxy:
      server:
        port: 4001
      rules:
        - regex: "blobs/sha256.*"
      registryMirror:
        addr: https://index.docker.io

Environment:

  • Dragonfly version: v0.2.0
  • OS: Amazon Linux 2
  • Kernel (e.g. uname -a): 5.10.228-219.884.amzn2.x86_64
  • Others: helm chart version: dragonfly-1.2.28

Container Runtime Config

/etc/containerd/certs.d
├── docker.io
│   └── hosts.toml
├── ghcr.io
│   └── hosts.toml
├── artifact
│   └── hosts.toml
└── artifact-dev
    └── hosts.toml
  • /etc/containerd/certs.d/docker.io/hosts.toml
server = "https://index.docker.io"

[host."http://127.0.0.1:4001"]
capabilities = ["pull", "resolve"]
skip_verify = true

[host."http://127.0.0.1:4001".header]
X-Dragonfly-Registry = "https://index.docker.io"
  • dfdaemon
tcp        0      0 0.0.0.0:4001            0.0.0.0:*               LISTEN      1141321/dfdaemon
@kelein kelein added the bug label Jan 3, 2025
@gaius-qi
Copy link
Member

gaius-qi commented Jan 7, 2025

@kelein Dragonfly has no logs, and the traffic does not pass through Dragonfly. Need to check the containerd log why Dragonfly mirror is not used. Please provide the containerd's log and dfdaemon's log.

@gaius-qi gaius-qi self-assigned this Jan 7, 2025
@kelein
Copy link
Author

kelein commented Jan 7, 2025

@gaius-qi This issue due to my containerd config.toml config error, because I firstly install dragonfly with client image dragonflyoss/client: v0.1.125. It auto generated a strange dir /etc/containerd/certs.d:/etc/docker/certs.d like this:

/etc/containerd/certs.d:
└── etc
    └── docker
        └── certs.d
            ├── docker.io
            │   └── hosts.toml
            ├── ghcr.io
            │   └── hosts.toml
            ├── artifact
            │   └── hosts.toml
            └── artifact-dev
                └── hosts.toml

After I upgrade the client image to dragonflyoss/client: v0.2.0, the config dir generated like this:

/etc/containerd/certs.d
├── docker.io
│   └── hosts.toml
├── ghcr.io
│   └── hosts.toml
├── artifact
│   └── hosts.toml
└── artifact-dev
    └── hosts.toml

But it does not modify the config_path automaticly, this seems make the dfdaemon proxy serivce not work well. So I try to change the config_path, and everthing work expectly.

$ diff config.toml config-old.toml
<       config_path = "/etc/containerd/certs.d"
---
>       config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

@gaius-qi
Copy link
Member

gaius-qi commented Jan 7, 2025

@kelein Does containerd's config_path support multiple paths? Could you please send me the docs link? Thank you.

@kelein
Copy link
Author

kelein commented Jan 8, 2025

Seems not, The official docs does not mention multiple paths supports. containerd-registry-configuration

@gaius-qi
Copy link
Member

@kelein What was the containerd configuration before client changed? Does it include multi paths?

@kelein
Copy link
Author

kelein commented Jan 15, 2025

@gaius-qi No, there was just only one line of config_path config.

@gaius-qi
Copy link
Member

@kelein Please provide me with the contents of containerd's config.toml (which has not been changed by the client) so that I can reproduce it.

@kelein
Copy link
Author

kelein commented Jan 15, 2025

version = 2
root = "/data/containerd/store"
state = "/data/containerd/run"
oom_score = 0



[grpc]
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = ""
  level = "info"
  format = ""
  uid = 0
  gid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
    max_container_log_line_size = -1
    enable_unprivileged_ports = false
    enable_unprivileged_icmp = false
    enable_selinux = false
    disable_apparmor = false
    tolerate_missing_hugetlb_controller = true
    disable_hugetlb_controller = true
    image_pull_progress_timeout = "5m"
    
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      snapshotter = "overlayfs"
      discard_unpacked_layers = true
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          base_runtime_spec = "/etc/containerd/cri-base.json"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            systemdCgroup = true
            binaryName = "/usr/local/bin/runc"

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

    [plugins."io.containerd.grpc.v1.cri".registry]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]

        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."artifact.jfrog.com"]
          endpoint = ["https://artifact.jfrog.com"]

        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."artifact-dev.jfrog.com"]
          endpoint = ["https://artifact-dev.jfrog.com"]

@CormickKneey CormickKneey self-assigned this Jan 16, 2025
@gaius-qi
Copy link
Member

@CormickKneey Thanks!

@CormickKneey
Copy link
Contributor

CormickKneey commented Jan 16, 2025

What dfinit does is simply create the following configuration and nothing more, as containerd automatically reads all the folders under /etc/containerd/certs.d.

/etc/containerd/certs.d
├── docker.io
│   └── hosts.toml
├── ghcr.io
│   └── hosts.toml
├── artifact
│   └── hosts.toml
└── artifact-dev
    └── hosts.toml

so the multi paths usage is no need.
I just checked, and this feature works properly. So, let's back to your issue. Is the changes made from dfinit correct as above? If it is. Can you provide the containerd version you are using? @kelein

@kelein
Copy link
Author

kelein commented Jan 18, 2025

❯ /bin/containerd --version
containerd github.com/containerd/containerd 1.7.23 57f17b0a6295a39009d861b89e3b3b87b005ca27

Sure, the ting i just only want to know is that why dragonfly client v0.1.125 auto created a dir /etc/containerd/certs.d: and updated containerd's config_path to config_path = "/etc/containerd/certs.d:/etc/docker/certs.d". Thanks.

@CormickKneey
Copy link
Contributor

Creating /etc/containerd/certs.d is expected; however, modifying config_path to "/etc/containerd/certs.d:/etc/docker/certs.d" is not as intended. It should be set to "/etc/containerd/certs.d". I'll review the code for this version.

@CormickKneey
Copy link
Contributor

Hi @kelein
i checked the specific version, everything looks good as follow:

##################### Config after modified ########################
root@dev:/data/d7y
$ diff  /etc/containerd/config.toml  /etc/containerd/config.toml.original
60c60,69
< config_path = "/etc/containerd/certs.d"
---
> [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
>
> [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
> endpoint = ["https://registry-1.docker.io"]
>
> [plugins."io.containerd.grpc.v1.cri".registry.mirrors."artifact.jfrog.com"]
> endpoint = ["https://artifact.jfrog.com"]
>
> [plugins."io.containerd.grpc.v1.cri".registry.mirrors."artifact-dev.jfrog.com"]
> endpoint = ["https://artifact-dev.jfrog.com"]
######################## Dir created #####################
root@dev:/data/d7y
$ tree /etc/containerd/certs.d/
/etc/containerd/certs.d/
├── artifact
│   └── hosts.toml
├── artifact-dev
│   └── hosts.toml
├── docker.io
│   └── hosts.toml
└── ghcr.io
    └── hosts.toml

4 directories, 4 files
######################## dfinit config #####################
root@dev:/data/d7y
$ cat dfinit.yaml
proxy:
  addr: http://127.0.0.1:4001
containerRuntime:
  containerd:
    configPath: /etc/containerd/config.toml
    registries:
      - hostNamespace: docker.io
        serverAddr: https://index.docker.io
        capabilities: [ "pull", "resolve" ]
        skipVerify: true
      - hostNamespace: ghcr.io
        serverAddr: https://ghcr.io
        capabilities: [ "pull", "resolve" ]
        skipVerify: true
      - hostNamespace: artifact
        serverAddr: https://artifactory.com
        capabilities: [ "pull", "resolve" ]
        skipVerify: true
      - hostNamespace: artifact-dev
        serverAddr: https://artifactory-dev.com
        capabilities: [ "pull", "resolve" ]
        skipVerify: true
########################  dfinit version #####################
root@dev:/data/d7y
$ ./client/target/debug/dfinit -V
dfinit 0.1.125 (df39410f2, 2024-12-09)

Can you find the diff in your environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants