-
Notifications
You must be signed in to change notification settings - Fork 12
ddnovikov submission #7
base: main
Are you sure you want to change the base?
Conversation
…rks under k6's load but obviously optimizations are required.
… to cost lots of time but are going to be marginal. Timeout/batch_size settings may be tied to the HW i used for the tests.
…k space for submissino.
Hello @ddnovikov, thank you for the great submission. We deployed your project and unfortunately, the models didn't load up properly on a GPU, here are the logs: logs
I know it's time-consuming to deal with this type of issue, if you want to proceed with the challenge we'd recommend you:
|
Oh, I see, thank you! Actually, I decided that I have a very fun idea to play around with to at least proudly put it on my github page later 😁 . Though I think it may be more of a mess in terms of GPU stuff (considering the issue above), but anything can be fixed given time and effort. For the purposes of the challenge I am not going to disclose the idea now, but judging by previous challenges I guess I have 2-3 weeks more to implement it, right? |
There's no deadline yet, I think it's safe to say that you have 2-3 weeks |
Hey @ddnovikov, issues with gpu usage were resolved, thank you for the beautiful solution. Here are our tests results on a grafana dashboard. If you would like to work on your python solution further, you can continue optimizing/improving it and re-request our review once done. Any contribution during the challenge period will be taken into account while choosing a winner. Many thanks! P.S. I'll come back with the dashboard for the rust solution a bit later today |
Hi @rsolovev! I decided to experiment with some improvements in the spare time. Could you please run the tests? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddnovikov sure, here are the results for the latest commit -- grafana
@rsolovev, thanks! I did some more improvements, could you please check them again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddnovikov -- new peak throughput record! -- grafana
@rsolovev Yay! I should also say that on newer Ampere architecture GPUs (such as NVIDIA A16 that I used) this code processes 2 times more iterations -- I reached a total of 15800. The funny thing about this is that the server I rented out (6 vCPU, 64 GB RAM, NVIDIA A16) costs 32% less (0.512$/hr vs 0.752$/hr) than g4dn.2xlarge while giving 2.1x better performance 🙃 |
@rsolovev Also, could you please tell what CUDA version is installed on the machines you're using? |
|
@rsolovev, hi, I have another attempt at ONNX. I have no idea if it runs because it didn't run on my machine. I suspect the driver issue and I can't resolve it with the hardware I have. I just hope it runs out of the box on your g4dn 🤣 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddnovikov onnx seems to launch successfully, but there are errors on request:
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<model_inference_task() done, defined at /code/./app.py:21> exception=IndexError('list index out of range')>
Traceback (most recent call last):
File "/code/./app.py", line 47, in model_inference_task
logits = model(**encoded_input).logits
File "/opt/conda/lib/python3.10/site-packages/optimum/modeling_base.py", line 85, in __call__
return self.forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 1234, in forward
io_binding, output_shapes, output_buffers = self.prepare_io_binding(
File "/opt/conda/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 807, in prepare_io_binding
return self._prepare_io_binding(self.model, ordered_input_names=ordered_input_names, *model_inputs)
File "/opt/conda/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 752, in _prepare_io_binding
name = ordered_input_names[idx]
IndexError: list index out of range
Hi!
Here's my solution. Not 100% sure if I enabled GPU in helm values correctly, but hopefully it's ok. Please advise on the changes if it doesn't work -- I don't have experience using kubernetes and helm.
I have some ideas of what can be improved and played with here to improve performance. But from my experience -- from this point most optimizations will be very time-consuming and give very marginal gains; b) even if there are good ideas for optimization (e.g. spending time to convert these models to ONNX?) -- they should be considered in the real world usage context to be worth it. So my final solution is what seems to be reasonable to do given the requirements and ~15-20 hours that I was ready to spend on the challenge.
But also for the purpose of experience -- I will be happy to hear about any easy ideas for serious optimization if you have some! Thanks!