-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug on aarch64-apple-ios: Buffer Validation Illegal MTLStorageMode 0x10 #2322
Comments
also mentioned in #1841 |
After a bit of digging into Apple MLX and especially how they handle buffer allocation on both macOS and iOS, I found this https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/allocator.cpp#L207 You will see that all allocations are centralized there and they always and only use ResourceStorageModeShared (you won't find references to other storage modes in their metal backend). So it seems like on iOS at least, managed buffers are not needed. It makes sense if we think about it as Metal on macOS must support both Intel (where GPU and RAM are not unified) and Apple Silicon, while Metal on iOS only cares about unified memory, hence no syncing needed / supported. Unless I've got it all wrong, which is possible. |
Also related, only the precompiled macOS library is present in this repo #1759 working on it on my fork |
Fixes happening here: https://github.com/evilsocket/candle |
"weight is invalid in distribution" usually means that the model generated a nan, so somehow one of the metal kernel probably did not work in the same way on iphone as it would have on macos. It might be good to print the tensors after each op to get a sense of where the nan is coming from. |
I have a feeling it comes from this https://codebrowser.dev/tokio/crates/rand-0.8.5/src/distributions/weighted_index.rs.html#454 .. i'm trying to narrow it down but to be honest I have very little experience with Candle so it's taking a while ... |
Yes it's coming from the sampling, and the input of the sampling is based on the logits generated on the model, I would think that it's almost always caused by one of these values being a nan (or an inf). |
tensor printf debugging ftw |
@LaurentMazare you were 100% right, the error is in the logits output vector being full of NaN, the sampling doesn't like that ... disabling the kv_cache on both workers and master seem to be improve things, the point being, this is a problem of my tool, the porting of candle by itself seems to work |
@LaurentMazare it seems to depend on some (version?) discrepancy between the libMetalFlashAttention.metallib bundled in this repo vs the iOS and macOS libMetalFlashAttention.metallib bundled here https://github.com/philipturner/metal-flash-attention/releases/tag/v1.0.1 In this issue #1759 @ivarflakstad suggests to use the iOS version from that release - however I found out that if I replace the the macOS metallib in this repo with that release, these tests fail https://github.com/evilsocket/candle-metal-tester/blob/main/core/src/lib.rs (even on macOS), suggesting that candle-metal-kernels makes some assumptions on the implementation of that library, that clearly changed over time. Bottom line, it would be great if the last person to update the lib on this repo ( @FL33TW00D via b6afb46 ) could document which specific version/commit that library was compiled from and if possible also compile the same version for iOS (or point to the right precompiled release). I suspect that aligning both macOS and iOS libMetalFlashAttention versions would solve the issue. |
We're currently using @FL33TW00D 's fork of mfa which iirc has a tiny change in how buffers are accessed. |
My plan was actually to fix all these issues as soon as I'm back from vacation. |
@ivarflakstad understood, i'll try to compile this for iOS https://github.com/FL33TW00D/metal-flash-attention and wait for proper fix, thanks |
@ivarflakstad just tried to compile @FL33TW00D fork (last commit), that is not the one:
|
I've literally tried to compile all branches of that repo and all of them fail the tests |
@ivarflakstad i became a little bit obsessed with this so I tried to compile and test every single commit of every branch of both forks of libMetalFlashAttention.metallib, both for macOS and iOS. My results are a bit discouraging: macOS:
iOS:
It's so frustrating because this single library is the only obstacle to iOS compatibility. |
Right. I tried to nudge you away from the rabbit hole, but I get the curiosity hehe. |
@ivarflakstad i am so looking forward to it - i don't think i have to tell you guys that there's a lot of potential in unlocking iOS. To my knowledge, Candle is currently the closest thing to a single framework supporting pretty much anything in a single codebase. |
So this is still broken?
on 0.8.1 |
I'm running candle with metal acceleration on iOS via uniffi, specifically:
But the call to
Device::new_metal
panics with:After some debugging, I figured this happens on this line due to iOS rejecting the
metal::MTLResourceOptions::StorageModeManaged
option.I could replicate with this simple test:
Which prints and then panics:
Other values such as StorageModeShared do not panic, but i'm not sure of the implications also because StorageModeManaged is used in several other places in the metal implementation.
The text was updated successfully, but these errors were encountered: