Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch plugin usage example #11

Merged

Conversation

igorvalko
Copy link
Contributor

No description provided.

Copy link
Contributor

@kumare3 kumare3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small changes. Thank you


Before running this make sure that
- pytorch plugin is enabled in flytepropeller's config
- `Kubeflow pytorch operator`_ is installed in your k8s cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably say that kustomize is configured to deploy pyrotechnics operator

per_replica_gpu_limit="1",
)
def mnist_pytorch_job(workflow_params, no_cuda, batch_size, test_batch_size, epochs, learning_rate, sgd_momentum, seed, log_interval, save_model, dir, out):
backend_type = dist.Backend.GLOO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So eventually I want this boilerplate code to be in the pytorch wrapper, can you add a TODO: simplify by abstracting the boilerplate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave TODOs in example/boilerplate code? Maybe it's better to create issue to remember this idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue works too, either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is great, and the TODO can point to the issue :)

dir=dir
)

accuracies = Output(mnist_result.outputs.out, sdk_type=Types.String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this be list of floats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/lyft/flytesnacks/pull/11/files#diff-daa70d7f0f13b99d1e904cc0cb7fd7a7R152 I'm stringifying it here to be sure that UI will just have to print string I've provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we dont need to do that, since it is just an array of floats, UI can show it, if we mark it as an array of floats. Do you want to change that?

@wild-endeavor wild-endeavor requested a review from kumare3 June 12, 2020 20:47

RUN pip install awscli

RUN pip install tensorboardX==2.0.0 flytekit[pytorch]==0.8.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does TensorBoard work in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we will need to run tensorboard somewhere i guess right?

log_interval=Types.Integer,
save_model=Types.Boolean,
dir=Types.String)
@outputs(out=Types.String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some improvements like changing output to be the model. @wild-endeavor can you help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. Currently it seems that the output out is a string, created from str(accuracies)

  1. I think we should change the name from "out" to "accuracies", and I feel like we should make it the same type that's currently the output of []epoch_step(). I don't know what epoch_step produces - can you fill me in @igorvalko ?
  2. This is a separate question though from saving the model though - do we want to save the model as a blob you mean? As a demo though, I don't know how that would be useful - one would have to download the model and use it in order to get numbers (like these accuracies) out of it right? We can save the model file sure, but I feel like the accuracies output makes more sense.

Copy link
Contributor Author

@igorvalko igorvalko Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all MNIST in pytorch world is sort of 'hello world' (https://github.com/pytorch/examples/tree/master/mnist)

One can store model on their own by supplying save_model=true. Here we might only adjust the example to store on s3 (for now it is local FS), but I don't know internals of the model object: whether it will collect distributed model state to master or it's to be done manually.
Epoch - is model training iteration. Accuracy is acquired by evaluating test dataset on a trained model after each iteration.

I considered that accuracy is good enough to demonstrate, that job was done since

  • it implies both training and evaluation steps
  • succinct result to show

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igorvalko it is possible to output a model as a blob object in Flyte - Here is an example: The benefit is, you can then simply point your notebook to flyteadmin and load the model https://github.com/lyft/flytesnacks/blob/master/python/multi_step_linear/diabetes_xgboost.py#L104

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and lets rename out to accuracy?

Copy link
Contributor

@kumare3 kumare3 Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@outputs(out=Types.String)
@outputs(accuracy=[Types.Float], model=Types.Blob)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kumare3
Copy link
Contributor

kumare3 commented Jun 16, 2020

@igorvalko thank you for all the patience and great work. There is just a small change requested. The reason is, examples are very important to get close to right, as they get replicated quickly

@wild-endeavor
Copy link
Contributor

@kumare3 i think this is fine right?

@kumare3
Copy link
Contributor

kumare3 commented Jun 16, 2020

Looks great to me, lets merge!

@kumare3 kumare3 self-requested a review June 16, 2020 18:43
@wild-endeavor wild-endeavor merged commit 761426a into flyteorg:master Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants