Replies: 1 comment
-
Is the error the same everytime? e.g. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Im trying to run inference on my tuned model, I tuned the HF weights and not the original.
i did use custom data
dataset:
component: torchtune.datasets.text_completion_dataset
running: tune run generate --config inference.yaml prompt="What are some interesting sites to visit in the Bay Area?"
with this config
output
DEBUG:torchtune.utils.logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
INFO:torchtune.utils.logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils.logging:What are some interesting sites to visit in the Bay Area?
INFO:torchtune.utils.logging:Time for inference: 0.65 sec total, 1.54 tokens/sec
INFO:torchtune.utils.logging:Bandwidth achieved: 31.64 GB/s
INFO:torchtune.utils.logging:Memory used: 20.62 GB
Would like help as to how i can run inference with this model
I aslo tired converting the weights using convert_hf_to_gguf.py from llama cpp
but i get
RuntimeError: Internal: could not parse ModelProto from /tmp/complete-model/tokenizer.model
I have tried replacing the tokenizer and redownloading it from hf but none seem to work.
Beta Was this translation helpful? Give feedback.
All reactions