-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch plugin usage example #11
Pytorch plugin usage example #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small changes. Thank you
pytorch/README.rst
Outdated
|
||
Before running this make sure that | ||
- pytorch plugin is enabled in flytepropeller's config | ||
- `Kubeflow pytorch operator`_ is installed in your k8s cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably say that kustomize is configured to deploy pyrotechnics operator
per_replica_gpu_limit="1", | ||
) | ||
def mnist_pytorch_job(workflow_params, no_cuda, batch_size, test_batch_size, epochs, learning_rate, sgd_momentum, seed, log_interval, save_model, dir, out): | ||
backend_type = dist.Backend.GLOO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So eventually I want this boilerplate code to be in the pytorch wrapper, can you add a TODO: simplify by abstracting the boilerplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave TODOs in example/boilerplate code? Maybe it's better to create issue to remember this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue works too, either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue is great, and the TODO can point to the issue :)
pytorch/workflows/mnist.py
Outdated
dir=dir | ||
) | ||
|
||
accuracies = Output(mnist_result.outputs.out, sdk_type=Types.String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this be list of floats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/lyft/flytesnacks/pull/11/files#diff-daa70d7f0f13b99d1e904cc0cb7fd7a7R152 I'm stringifying it here to be sure that UI will just have to print string I've provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we dont need to do that, since it is just an array of floats, UI can show it, if we mark it as an array of floats. Do you want to change that?
|
||
RUN pip install awscli | ||
|
||
RUN pip install tensorboardX==2.0.0 flytekit[pytorch]==0.8.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does TensorBoard work in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we will need to run tensorboard somewhere i guess right?
pytorch/workflows/mnist.py
Outdated
log_interval=Types.Integer, | ||
save_model=Types.Boolean, | ||
dir=Types.String) | ||
@outputs(out=Types.String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs some improvements like changing output to be the model. @wild-endeavor can you help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand. Currently it seems that the output out
is a string, created from str(accuracies)
- I think we should change the name from "out" to "accuracies", and I feel like we should make it the same type that's currently the output of []epoch_step(). I don't know what epoch_step produces - can you fill me in @igorvalko ?
- This is a separate question though from saving the model though - do we want to save the model as a blob you mean? As a demo though, I don't know how that would be useful - one would have to download the model and use it in order to get numbers (like these accuracies) out of it right? We can save the model file sure, but I feel like the accuracies output makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all MNIST in pytorch world is sort of 'hello world' (https://github.com/pytorch/examples/tree/master/mnist)
One can store model on their own by supplying save_model=true
. Here we might only adjust the example to store on s3 (for now it is local FS), but I don't know internals of the model object: whether it will collect distributed model state to master or it's to be done manually.
Epoch - is model training iteration. Accuracy is acquired by evaluating test dataset on a trained model after each iteration.
I considered that accuracy is good enough to demonstrate, that job was done since
- it implies both training and evaluation steps
- succinct result to show
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igorvalko it is possible to output a model as a blob object in Flyte - Here is an example: The benefit is, you can then simply point your notebook to flyteadmin and load the model https://github.com/lyft/flytesnacks/blob/master/python/multi_step_linear/diabetes_xgboost.py#L104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and lets rename out to accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@outputs(out=Types.String) | |
@outputs(accuracy=[Types.Float], model=Types.Blob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@igorvalko thank you for all the patience and great work. There is just a small change requested. The reason is, examples are very important to get close to right, as they get replicated quickly |
@kumare3 i think this is fine right? |
Looks great to me, lets merge! |
No description provided.