Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly preheat not speeding up subsequent image load; errors reported #3674

Open
amholler opened this issue Nov 27, 2024 · 9 comments
Open
Assignees
Labels

Comments

@amholler
Copy link

amholler commented Nov 27, 2024

Bug report:

On a GKE regional cluster comprising 3 e2-standard-16 nodes, I successfully installed dragonfly via:
helm install --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly --version 1.2.24 -f values.upd.yaml

where values.upd.yaml contained:
manager:
  metrics:
    enable: true
  config:
    verbose: true
    pprofPort: 18066
  resources:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "1"
      memory: "2Gi"

scheduler:
  metrics:
    enable: true
  config:
    verbose: true
    pprofPort: 18066
  resources:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "1"
      memory: "2Gi"

seedClient:
  metrics:
    enable: true
  config:
    verbose: true
  resources:
    requests:
      cpu: "2"
      memory: "12Gi"
    limits:
      cpu: "2"
      memory: "12Gi"

client:
  metrics:
    enable: true
  config:
    verbose: true
  dfinit:
    enable: true
    config:
      download:
        rateLimit: 10GiB
        concurrentPieceCount: 16
      upload:
        rateLimit: 10GiB
  resources:
    requests:
      cpu: "2"
      memory: "12Gi"
    limits:
      cpu: "2"
      memory: "12Gi"

I created a token and then ran the following all_peers preheat job, which completed successfully:

curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Njg4NDRhZGYtYmY0ZS00NmIwLTk5MWQtZjIwM2U1MjBjNGM1' \
--data-raw '{
    "type": "preheat",
    "args": {
        "type": "image",
        "url": "https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311",
        "scope": "all_peers"
    }
}'

{"id":1,"created_at":"2024-11-27T18:33:17Z","updated_at":"2024-11-27T18:36:05Z","is_del":0,"task_id":"group_e2a72159-58dc-4a90-acd1-8b7aa1ca5e71","bio":"","type":"preheat","state":"SUCCESS","args":{"concurrent_count":50,"filtered_query_params":"X-Amz-Algorithm\u0026X-Amz-Credential\u0026X-Amz-Date\u0026X-Amz-Expires\u0026X-Amz-SignedHeaders\u0026X-Amz-Signature\u0026X-Amz-Security-Token\u0026X-Amz-User-Agent\u0026X-Goog-Algorithm\u0026X-Goog-Credential\u0026X-Goog-Date\u0026X-Goog-Expires\u0026X-Goog-SignedHeaders\u0026X-Goog-Signature\u0026OSSAccessKeyId\u0026Expires\u0026Signature\u0026SecurityToken\u0026AccessKeyId\u0026Signature\u0026Expires\u0026X-Obs-Date\u0026X-Obs-Security-Token\u0026q-sign-algorithm\u0026q-ak\u0026q-sign-time\u0026q-key-time\u0026q-header-list\u0026q-url-param-list\u0026q-signature\u0026x-cos-security-token\u0026ns","headers":null,"password":"","platform":"","scope":"all_peers","tag":"","timeout":1800000000000,"type":"image","url":"https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311","username":""},"result":{"created_at":"2024-11-27T18:33:17.843257441Z","group_uuid":"group_e2a72159-58dc-4a90-acd1-8b7aa1ca5e71","job_states":[{"created_at":"2024-11-27T18:33:17.843257441Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:61feeac8814edc3f406d9a29a2ee0c3ed71ea2f3c60d3bace48950c1ff588742"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_483d138e-68a2-43f4-b9e5-e7676758723a","ttl":0},{"created_at":"2024-11-27T18:33:17.844129378Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:43f89b94cd7df92a2f7e565b8fb1b7f502eff2cd225508cbd7ea2d36a9a3a601"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_1ba2f78b-e6d3-4db5-96cb-e2693a3fd2b7","ttl":0},{"created_at":"2024-11-27T18:33:17.844666533Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5e3b7ee7738140e8f4608c3945b6e1ed4f9fb75db53a04e19ba0a6661e7cc4fe"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_0f9bb343-632f-4f07-9ab4-af22eafab36b","ttl":0},{"created_at":"2024-11-27T18:33:17.845247255Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5bd037f007fdda13ae5a5f43a199d6677db1f9059c2980c84726e3a43fab169a"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_ff963648-0402-4a18-9ce6-41cc26b7d610","ttl":0},{"created_at":"2024-11-27T18:33:17.845737521Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4cda774ad2ecef28c9a1cd97594f7199071c83769f91c5d109eb1cb6770ecdff"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_a876cdad-ece2-4768-af3a-221c1006c71b","ttl":0},{"created_at":"2024-11-27T18:33:17.846171112Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:775f22adee620daec0db645bad7027db4c1ecf22520412e1b2466fc73d54d19b"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_8562a75f-8727-47a0-aa79-6c39fd783612","ttl":0},{"created_at":"2024-11-27T18:33:17.846592178Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:263fc748118f7937f811e3e9c9355318db07dd2dd1dccc370dadaa7d0b5ed692"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c29ca16d-bfbf-4b5b-b0c3-2634dab32cc2","ttl":0},{"created_at":"2024-11-27T18:33:17.84709593Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:16c36d0187d03bd0de84d870ded86c45fabd78f4bfdb2ed90177e5fc4dd33d11"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c2660534-1bd9-4d68-9272-add419776cf4","ttl":0},{"created_at":"2024-11-27T18:33:17.848092386Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:e7a56570655c990ecc804c77873efc83f9a6c31064e3e8a5dc02430213f2d74c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e3b209e8-7ac6-498a-9395-d6a83106e2b2","ttl":0},{"created_at":"2024-11-27T18:33:17.848490128Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:507fc9045cbad45c1c4ca554a6453fe0a1c9ae74667db0612fec7475256d5c23"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_561899d6-9538-4e6c-968a-de630004fd57","ttl":0},{"created_at":"2024-11-27T18:33:17.8489235Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:23b7d8e07c16707ff4ec3ca558a8099c454953c840156c318a60a6b4273846a0"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_93c5d79e-ebfd-4548-88bb-91514e92da50","ttl":0},{"created_at":"2024-11-27T18:33:17.849376889Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:922ac8fcb88926d95550e82f83c14a4f3f3eaab635e7acf43ee0c59dea0c14d7"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_5faf2d1d-5fb8-4b64-92eb-c18606741d05","ttl":0},{"created_at":"2024-11-27T18:33:17.849681518Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:68075f2beca1cfd3f243ec110000716dff39d895f4d5e0d3faba7ace430f9633"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c6bd7eb3-b51a-4fbc-8766-3875600184a6","ttl":0},{"created_at":"2024-11-27T18:33:17.850123858Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8509afffa2f4447cf6eb4060aba992e855f61c1fdbaef360bef367a3deea5afd"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_fb3c48ba-1a7b-4d6a-8f14-e837fde2f29f","ttl":0},{"created_at":"2024-11-27T18:33:17.850482208Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:b6a3e93ea08fa0a42ceaeda8786839ed06d17429e072e5cba725b3d7e0116b19"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_2d2ec513-f5fb-40f1-a775-f60faf17fab4","ttl":0},{"created_at":"2024-11-27T18:33:17.850939108Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e515c281-4d21-40d1-b731-2307a1527cc8","ttl":0},{"created_at":"2024-11-27T18:33:17.851427527Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:0fdc0c77854ade5095e513b49a62f25fcf894ad08e8c93d6ab2418c02d293b2c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_db3a1cea-5ba6-4139-ac23-abda8d4b766c","ttl":0},{"created_at":"2024-11-27T18:33:17.851827952Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a333090de9f0b4a401a269aebeef18871555056477cf4b607f89a56c4e097a8a"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_74851ce1-517d-435f-b582-f42d108fdaf2","ttl":0},{"created_at":"2024-11-27T18:33:17.852345932Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:cfd03f593d3c72597de12049ad9da078ecb03b1cf82f5c547b37db13dce0193c"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_c067d25e-57ad-4e4b-a676-a478500edc4e","ttl":0},{"created_at":"2024-11-27T18:33:17.852818036Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:6c0cfc9736597fd3fba4b60241efc7fc1410d3dd66625a30f5caaa0194d913da"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_1c0a5956-3dd7-4147-a0d6-fdbaa1703162","ttl":0},{"created_at":"2024-11-27T18:33:17.853333972Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:a07abd82fba1ad80e123eafea1aff1dd4d7404d10eeebef491ae46d100f24508"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_4d3d6c33-8bed-426b-830a-aeb5f2384a04","ttl":0},{"created_at":"2024-11-27T18:33:17.854141265Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_2b1bfa6b-7307-45da-9a9c-a816b51e4b3c","ttl":0},{"created_at":"2024-11-27T18:33:17.854647246Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:f2b356ed4af6ca3647340bb3b8d6c95e66e5d22d2f790b8d80dc251cbd4ee24d"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_e7bb70dc-f677-4986-b516-6ad36284d4cb","ttl":0},{"created_at":"2024-11-27T18:33:17.855116976Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:7c10ed83ba3318d7ef10210ae469247c685a5be84f3b97fe12d21169d31d3dd2"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_67e37355-ff8a-4d9b-801f-e64c6bb44d99","ttl":0},{"created_at":"2024-11-27T18:33:17.855531247Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:8b55ca9b80a879137ae6d9937a84fc4c5bee3aaeefb7dd815e7fe17a98a9e351"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_fcb229fa-b20a-405f-9193-25785356c723","ttl":0},{"created_at":"2024-11-27T18:33:17.855951857Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:5d7bcf957c0928678aa351babc3e2eede543bf26e9bcec47907ee1598d87c8c7"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_b5486c26-77ed-4f34-bb50-a993b52f2594","ttl":0},{"created_at":"2024-11-27T18:33:17.856466669Z","error":"","results":[{"failure_tasks":null,"scheduler_cluster_id":1,"success_tasks":[{"hostname":"dragonfly-seed-client-0","ip":"10.24.2.6","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-23f7047d-0c4w","ip":"10.128.15.206","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"dragonfly-seed-client-1","ip":"10.24.0.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-cbcddcd0-mcp6","ip":"10.128.15.205","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"dragonfly-seed-client-2","ip":"10.24.1.11","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"},{"hostname":"gke-anne-regional-default-pool-f7fcf7d7-68px","ip":"10.128.15.207","url":"https://index.docker.io/v2/rayproject/ray-ml/blobs/sha256:777e2226c898d04426bed857c87ed5f1e5a6e605125725d4c772bee365abef47"}]}],"state":"SUCCESS","task_name":"preheat","task_uuid":"task_503f418f-4cfc-4437-a434-c636e55041eb","ttl":0}],"state":"SUCCESS","updated_at":"2024-11-27T18:36:05.080487112Z"},"user_id":0,"user":{"id":0,"created_at":"0001-01-01T00:00:00Z","updated_at":"0001-01-01T00:00:00Z","is_del":0,"email":"","name":"","avatar":"","phone":"","state":"","location":"","bio":"","configs":null},"seed_peer_clusters":[],"scheduler_clusters":[{"id":1,"created_at":"2024-11-27T18:23:27Z","updated_at":"2024-11-27T18:23:27Z","is_del":0,"name":"cluster-1","bio":"","config":{"candidate_parent_limit":4,"filter_parent_limit":15,"job_rate_limit":10},"client_config":{"load_limit":200},"scopes":{},"is_default":true,"seed_peer_clusters":null,"schedulers":null,"peers":null,"jobs":null}]}

I then executed a deployment that used the preheated image:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rayproject
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rayproject
  template:
    metadata:
      labels:
        app: rayproject
        elotl-luna: "true"
    spec:
      containers:
        - name: rayproject
          image: rayproject/ray-ml:2.33.0.914af0-py311
          resources:
            requests:
              cpu: 1
              memory: 2Gi
            limits:
              cpu: 1
              memory: 2Gi
          command:
            - sleep
            - "infinity"

and the image pull in the deployment was just as slow as usual:

kubectl describe pod/rayproject-b75949b97-5xlws
...
  Normal  Pulled     23s   kubelet            Successfully pulled image "rayproject/ray-ml:2.33.0.914af0-py311" in 5m42.029s (5m42.029s including waiting). Image size: 11092293251 bytes.
  
I checked the dfdaemon logs on the 3 clients and they contained error messages such as (I can upload full logs):

2024-11-27T18:33:18.220138717+00:00 ERROR download_task:download:download_partial_with_scheduler: dragonfly-client/src/resource/task.rs:519: announce p
eer failed: TonicStatus(Status { code: NotFound, message: "host 10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w not found", metadata: Metada
taMap { headers: {"content-type": "application/grpc"} }, source: None }) host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="
b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412
a-9d71-ebf08280345f"
2024-11-27T18:33:18.220258164+00:00 ERROR download_task:download: dragonfly-client/src/resource/task.rs:391: download with scheduler error: TonicStatus
(Status { code: NotFound, message: "host 10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w not found", metadata: MetadataMap { headers: {"cont
ent-type": "application/grpc"} }, source: None }) host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493
f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"
2024-11-27T18:33:18.246442266+00:00 ERROR download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer:run:c
ollect_from_remote_peers: dragonfly-client/src/resource/piece_collector.rs:279: sync pieces failed: task 533 was cancelled host_id="10.128.15.206-gke-a
nne-regional-default-pool-23f7047d-0c4w" task_id="752e6c00b91aa95cf393a5fb243b84c0dc39d91497faf60cd050af51be61596f" peer_id="10.128.15.206-gke-anne-reg
ional-default-pool-23f7047d-0c4w-a079ad57-33ce-405a-ac69-c4b0f8e6b502"
<etc>

The host name that is reported as not found does not match any of my host names.  I don't actually know if these errors are associated with the preheat not speeding up the image load but I'm assuming they may be.

$ kubectl get nodes -o wide
NAME                                           STATUS   ROLES    AGE   VERSION               INTERNAL-IP     EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
gke-anne-regional-default-pool-23f7047d-0c4w   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.206   34.55.239.12     Container-Optimized OS from Google   6.1.100+         containerd://1.7.19
gke-anne-regional-default-pool-cbcddcd0-mcp6   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.205   34.172.252.60    Container-Optimized OS from Google   6.1.100+         containerd://1.7.19
gke-anne-regional-default-pool-f7fcf7d7-68px   Ready    <none>   59m   v1.30.5-gke.1014003   10.128.15.207   35.192.141.105   Container-Optimized OS from Google   6.1.100+         containerd://1.7.19

Expected behavior:

Expect image pull time to be greatly reduced after the preheat. Don't expect the client logs to be filed with errors.

How to reproduce it:

See details in section above.

Environment:

  • Dragonfly version: 1.2.24
  • OS: cos-113-18244-151-27
  • Kernel (e.g. uname -a): Linux gke-anne-regional-default-pool-23f7047d-0c4w 6.1.100+ [WIP] Implement df daemon #1 SMP PREEMPT_DYNAMIC Sat Aug 24 16:19:44 UTC 2024 x86_64 GNU/Linux
  • Others: GKE 1.30.5-gke.1014003
@amholler amholler added the bug label Nov 27, 2024
@gaius-qi
Copy link
Member

@amholler Please provide the full log in dfdaemon.log, thanks.

@gaius-qi gaius-qi self-assigned this Nov 28, 2024
@amholler
Copy link
Author

clientlogs.tar.gz

@amholler
Copy link
Author

Hi, @gaius-qi have uploaded a tar that includes the dfdaemon.log files from each of the 3 client instances. Thanks!

@gaius-qi
Copy link
Member

gaius-qi commented Dec 5, 2024

@amholler This log is the time when you downloaded a piece from the remote:

2024-11-27T18:33:23.710454826+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer:run:collect_from_remote_peers: dragonfly-client/src/resource/piece_collector.rs:212: received piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 metadata from parent 10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"
2024-11-27T18:33:23.710551717+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer: dragonfly-client/src/resource/task.rs:969: start to download piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 from remote peer "10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed" host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"
2024-11-27T18:33:24.576013333+00:00  INFO download_task:download:download_partial_with_scheduler:download_partial_with_scheduler_from_remote_peer: dragonfly-client/src/resource/task.rs:1065: finished piece 7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68-224 from remote peer Some("10.24.2.6-dragonfly-seed-client-0-78426839-bf26-47bc-ae5e-bd7337b49b87-seed") host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="7608d777a63129791161b983810864ec81ec5fb793e73f53088fcd4710f13c68" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-0acbd9dd-8ceb-4aa9-9495-b0ac4787c1e3"

This log is the time when you downloaded a piece from the source:

2024-11-27T18:33:18.220331735+00:00  INFO download_task:download:download_partial_from_source: dragonfly-client/src/resource/task.rs:1546: start to download piece b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e-0 from source host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"
2024-11-27T18:33:18.297044117+00:00  INFO download_task:download:download_partial_from_source: dragonfly-client/src/resource/task.rs:1601: finished piece b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e-0 from source host_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w" task_id="b253b75db00bc5c4b749493f6349d2d93d4778d6ccd375d96db50aa5251a328e" peer_id="10.128.15.206-gke-anne-regional-default-pool-23f7047d-0c4w-b33de249-93d7-412a-9d71-ebf08280345f"

The faster P2P download speed needs to meet the following:

  1. The download speed directly back to the source is slower than the download speed between Peer and peer.
  2. When the source bandwidth is full during large-scale downloads, P2P will be faster.

@amholler
Copy link
Author

amholler commented Dec 5, 2024

Hi, @gaius-qi Thanks for your analysis! But I'm missing something here: I had issued a successful preheat to all peers, so why does any peer to peer operation need to happen? I thought all peers should have the full image after the preheat from source completed, so image load after the preheat should come directly from the local cache, and hence be very fast.

@gaius-qi
Copy link
Member

@amholler How do you preheat the image to all peers?

@amholler
Copy link
Author

amholler commented Dec 10, 2024

Hi, @gaius-qi , Thanks for responding.

How do you preheat the image to all peers?

As I mentioned in the first comment in this ticket, I did it as follows:

I created a token and then ran the following all_peers preheat job, which completed successfully:

curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Njg4NDRhZGYtYmY0ZS00NmIwLTk5MWQtZjIwM2U1MjBjNGM1' \
--data-raw '{
    "type": "preheat",
    "args": {
        "type": "image",
        "url": "https://index.docker.io/v2/rayproject/ray-ml/manifests/2.33.0.914af0-py311",
        "scope": "all_peers"
    }
}'

@gaius-qi
Copy link
Member

@amholler Please provide Manager, Scheduler and one of the peer logs.

@amholler
Copy link
Author

Hi, @gaius-qi, Thanks for your response. I can't afford to leave idle cloud clusters running, and since the preheat was reported as successful, I didn't retain those logs. I will need to rerun the scenario to collect those logs. When I can rerun the scenario, it will be good if we can agree on the full set of logs to be captured, so that testing does not have to repeated.
[I actually repeated this experiment multiple times before filing this ticket, just to make sure it was reproducible.] Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants