unable to build kustomize for mnist example #681

ryandawsonuk · 2019-11-21T14:30:11Z

I'm trying to follow the mnist example with the local storage steps. I've tried to follow those steps but when I do kustomize build . then I get:

no matches for OriginalId kubeflow.org_v1beta2_TFJob|~X|$(trainingName); no matches for CurrentId kubeflow.org_v1beta2_TFJob|~X|$(trainingName); failed to find unique target for patch kubeflow.org_v1beta2_TFJob|$(trainingName

I've tried with kustomize v3 (go get -u sigs.k8s.io/kustomize/kustomize/v3) and v2 (go get -u sigs.k8s.io/kustomize/kustomize/v2) but I get the same error with both. I am running from the training/local directory (have also tried the GCS one and get the same error).

I'm not able to get as far as #672 as I can't get the kustomize build . step to complete.

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2019-11-21T14:30:13Z

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.61. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

fenglixa · 2019-11-22T02:23:01Z

kustomize 2.0.3 is required and mentioned in minist example document

Seems other version of kustomize has issues on it. I remember such issues was logged and closed before

ryandawsonuk · 2019-11-22T16:13:03Z

Oh I hadn't noticed that. But this time I tried with go get sigs.k8s.io/kustomize/kustomize/[email protected] and I get the same error :( I removed the existing kustomize version first and did a which kustomize to check. Actually it's a different error from the one referenced in the doc

ryandawsonuk · 2019-11-22T16:35:11Z

I've tried to deduce what the kustomize should evaluate to but I keep getting a path or format wrong. Would you be able to share an example?

fenglixa · 2019-11-25T01:57:33Z

Here is the output(sucessful example) from myside after run "kustomize build ."

apiVersion: v1
data:
  batchSize: "100"
  exportDir: /mnt/export
  learningRate: "0.02"
  modelDir: /mnt
  name: tfjob-021
  pvcMountPath: /mnt
  pvcName: fengpvc
  trainSteps: "200"
kind: ConfigMap
metadata:
  name: mnist-map-training-4t25c985bg
---
apiVersion: kubeflow.org/v1beta2
kind: TFJob
metadata:
  name: tfjob-021
  namespace: kubeflow
spec:
  tfReplicaSpecs:
    Chief:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.02"
            image: docker.io/fenglixa/mytfmodel:tag
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: fengpvc
    Ps:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.02"
            image: docker.io/fenglixa/mytfmodel:tag
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: fengpvc
    Worker:
      replicas: 2
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.02"
            image: docker.io/fenglixa/mytfmodel:tag
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: fengpvc

fenglixa · 2019-11-25T02:02:33Z

Issue #609
Should be same issue.

ryandawsonuk · 2019-11-25T15:33:42Z

After editing that yaml I was able to run the TFJob.

plaffitte · 2019-11-28T12:48:09Z

@ryandawsonuk How exactly did you solve this? I can't get past the issue with the TFJob version...

ryandawsonuk · 2019-11-28T13:31:56Z

@plaffitte I didn't get the kustomize working yet, I just modified the yaml that @fenglixa provided above to use the v1 format - SeldonIO/seldon-core#1106 (comment)

plaffitte · 2019-11-28T17:05:47Z

I get the following error:

unable to recognize "config.yaml": no matches for kind "TFJob" in version "kubeflow.org/v1beta2"

My file looks like this:

data:
  batchSize: "100"
  exportDir: /mnt/export
  learningRate: "0.01"
  modelDir: /mnt
  name: mnist-train-local
  pvcMountPath: /mnt
  pvcName: mnist-test
  trainSteps: "200"
kind: ConfigMap
metadata:
  name: mnist-map-training-kcc7dkhf4b
---
apiVersion: kubeflow.org/v1beta2
kind: TFJob
metadata:
  name: mnist-train-local
  namespace: kubeflow
spec:
  tfReplicaSpecs:
    Chief:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test
    Ps:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test
    Worker:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test

ryandawsonuk · 2019-11-28T17:10:33Z

Yeah the TFJob version in kubeflow now is

apiVersion: kubeflow.org/v1
kind: TFJob

plaffitte · 2019-11-28T17:12:48Z

Oops, sorry. I actually tried both and failed but copy-pasted the wrong one...
Here's the error I get:

unable to recognize "config.yaml": no matches for kind "TFJob" in version "kubeflow.org/v1"

And my file:

data:
  batchSize: "100"
  exportDir: /mnt/export
  learningRate: "0.01"
  modelDir: /mnt
  name: mnist-train-local
  pvcMountPath: /mnt
  pvcName: mnist-test
  trainSteps: "200"
kind: ConfigMap
metadata:
  name: mnist-map-training-kcc7dkhf4b
---
apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
  name: mnist-train-local
  namespace: kubeflow
spec:
  tfReplicaSpecs:
    Chief:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test
    Ps:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test
    Worker:
      replicas: 1
      template:
        spec:
          containers:
          - command:
            - /usr/bin/python
            - /opt/model.py
            - --tf-model-dir=$(modelDir)
            - --tf-export-dir=$(exportDir)
            - --tf-train-steps=$(trainSteps)
            - --tf-batch-size=$(batchSize)
            - --tf-learning-rate=$(learningRate)
            env:
            - name: modelDir
              value: /mnt
            - name: exportDir
              value: /mnt/export
            - name: trainSteps
              value: "200"
            - name: batchSize
              value: "100"
            - name: learningRate
              value: "0.01"
            image: docker.io/pierremoodagent/mytfmodel:test
            name: tensorflow
            volumeMounts:
            - mountPath: /mnt
              name: local-storage
            workingDir: /opt
          restartPolicy: OnFailure
          volumes:
          - name: local-storage
            persistentVolumeClaim:
              claimName: mnist-test

ryandawsonuk · 2019-11-28T18:12:35Z

What does kubectl get crd tfjobs.kubeflow.org -o yaml return for you?

plaffitte · 2019-11-29T10:20:57Z

It returns Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "tfjobs.kubeflow.org" not found

ryandawsonuk · 2019-11-29T11:52:51Z

Then you need to install the CRD. Did you do a kfctl install of kubeflow?

janeman98 · 2019-12-21T07:51:34Z

I still get error (but different from v3.4.0) when using v2.0.3:

kustomize version
Version: {KustomizeVersion:2.0.3 GitCommit:a6f65144121d1955266b0cd836ce954c04122dc8 BuildDate:2019-03-05T20:37:42Z GoOs:linux GoArch:amd64}

jtfogarty · 2020-01-08T23:14:50Z

/area example

k8s-ci-robot · 2020-01-08T23:14:51Z

@jtfogarty: The label(s) area/kustomize cannot be applied, because the repository doesn't have them

In response to this:

/area kustomize

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jlewi · 2020-02-11T01:33:36Z

The version should probably be v1.

stale · 2020-05-11T03:51:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

issue-label-bot · 2020-05-11T03:51:18Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/tfjob	0.71

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

issue-label-bot bot added the kind/bug label Nov 21, 2019

ryandawsonuk mentioned this issue Nov 21, 2019

unable to work with pvc kserve/kserve#563

Closed

ryandawsonuk mentioned this issue Nov 22, 2019

[kustomize 2.1.0] cannot read data from configMap kubernetes-sigs/kustomize#1295

Closed

jlewi added the priority/p2 label Nov 26, 2019

fenglixa mentioned this issue Dec 19, 2019

Add fenglixa from IBM kubeflow/internal-acls#198

Merged

k8s-ci-robot added the area/example label Feb 1, 2020

stale bot added the lifecycle/stale label May 11, 2020

issue-label-bot bot added the area/tfjob label May 11, 2020

stale bot closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to build kustomize for mnist example #681

unable to build kustomize for mnist example #681

ryandawsonuk commented Nov 21, 2019

issue-label-bot bot commented Nov 21, 2019

fenglixa commented Nov 22, 2019

ryandawsonuk commented Nov 22, 2019 •

edited

Loading

ryandawsonuk commented Nov 22, 2019 •

edited

Loading

fenglixa commented Nov 25, 2019

fenglixa commented Nov 25, 2019

ryandawsonuk commented Nov 25, 2019

plaffitte commented Nov 28, 2019

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 28, 2019 •

edited

Loading

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 28, 2019

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 29, 2019

ryandawsonuk commented Nov 29, 2019

janeman98 commented Dec 21, 2019

jtfogarty commented Jan 8, 2020 •

edited

Loading

k8s-ci-robot commented Jan 8, 2020

jlewi commented Feb 11, 2020

stale bot commented May 11, 2020

issue-label-bot bot commented May 11, 2020

unable to build kustomize for mnist example #681

unable to build kustomize for mnist example #681

Comments

ryandawsonuk commented Nov 21, 2019

issue-label-bot bot commented Nov 21, 2019

fenglixa commented Nov 22, 2019

ryandawsonuk commented Nov 22, 2019 • edited Loading

ryandawsonuk commented Nov 22, 2019 • edited Loading

fenglixa commented Nov 25, 2019

fenglixa commented Nov 25, 2019

ryandawsonuk commented Nov 25, 2019

plaffitte commented Nov 28, 2019

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 28, 2019 • edited Loading

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 28, 2019

ryandawsonuk commented Nov 28, 2019

plaffitte commented Nov 29, 2019

ryandawsonuk commented Nov 29, 2019

janeman98 commented Dec 21, 2019

jtfogarty commented Jan 8, 2020 • edited Loading

k8s-ci-robot commented Jan 8, 2020

jlewi commented Feb 11, 2020

stale bot commented May 11, 2020

issue-label-bot bot commented May 11, 2020

ryandawsonuk commented Nov 22, 2019 •

edited

Loading

ryandawsonuk commented Nov 22, 2019 •

edited

Loading

plaffitte commented Nov 28, 2019 •

edited

Loading

jtfogarty commented Jan 8, 2020 •

edited

Loading