-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After upgrading to version 1.8.0, the async function loadModelFromUrl
is not completing when using large models
#31
Comments
Probably because the out of memory error is now thrown internally by cpp code (and not by worker js code). Can you confirm if you see error from |
Sounds like the same issue I came across here?
// Doh, you already figured that out :-) |
Thanks for the reference. There is a lot of good info in that thread! I've just noticed a pattern regarding this issue: The When I force it to use PS: I haven't tested your changes from #34. |
ℹ️ This issue ( Note: iOS browsers don't clear the memory of web workers properly when reloading the page. For instance, if the page is reloaded before calling
|
@felladrin Sorry for the late response. Yeah seems like there are a lot of problems with Safari on iOS.
Do you get the same error as last time (i.e.
Probably we can make the web worker to exit itself when the page reload. But I still doubt doing this, since this should be responsibility of the browser. I'll have a look on this when I have more time. |
Ah, no worries @ngxson! Not sure when I'll try larger models on iOS again, but if I find anything new, I'll share here! |
After the launch of iOS 18, most of those issues related to out-of-memory seem to have been gone! 🎉 I noticed that they (Apple) now force Safari to hard-reload the page when it finds it with too low memory. After the reload, with more memory available, the models usually run fine. Wllama can easily run 1B models (e.g. Llama 3.2 1B Q4_K_M) in <6GB-Memory iPhone. |
Even the next iPhone SE is rumored to have 8GB of memory, so Apple is quickly making 8GB the new baseline. (The latest iPhone also comes with at least 8GB). |
Something interesting occurred while upgrading to version 1.8.0. Previously, it had been throwing an "Out of Memory" error, but that issue has now been resolved. However, a new problem has surfaced, where the async function
loadModelFromUrl
does not complete. It appears to be stuck in a state where it neither resolves nor rejects. It's possible that the error may be caught in the middle of the process and not being passed up.This issue can be reproduced with models that are too large to fit into the device's memory. It works perfectly fine with smaller models.
It's possible that this problem is related to the changes made in this pull request:
However, as I only encountered this issue on the iOS browser, it's also possible that it's related to this change:
If anyone would like to test this problem, you can use this 10-part split-gguf of TinyLlama on a device with less than 6GB of RAM:
https://huggingface.co/Felladrin/gguf-sharded-TinyLlama-1.1B-1T-OpenOrca/resolve/main/tinyllama-1.1b-1t-openorca.Q3_K_S.shard-00001-of-00010.gguf
.(If an even larger model is needed, there are also Q4_K_M and Q8_0 versions available in this repository.)
The text was updated successfully, but these errors were encountered: