Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug on aarch64-apple-ios: Buffer Validation Illegal MTLStorageMode 0x10 #2322

Open
evilsocket opened this issue Jul 8, 2024 · 20 comments
Open

Comments

@evilsocket
Copy link

I'm running candle with metal acceleration on iOS via uniffi, specifically:

candle-core = { version = "0.6.0", features = ["metal"] }
candle-nn = { version = "0.6.0", features = ["metal"] }
candle-transformers = { version = "0.6.0", features = ["metal"] }

But the call to Device::new_metal panics with:

-[MTLDebugDevice newBufferWithBytes:length:options:]:670: failed assertion `Buffer Validation
Illegal MTLStorageMode 0x10'

After some debugging, I figured this happens on this line due to iOS rejecting the metal::MTLResourceOptions::StorageModeManaged option.

I could replicate with this simple test:

#[uniffi::export]
pub fn test_metal() {
    let device = metal::Device::all().swap_remove(0);

    println!("device: {:?}", &device);

    println!(
        "MTLResourceOptions::StorageModeManaged = 0x{:x}",
        metal::MTLResourceOptions::StorageModeManaged
    );

    // it panics here
    let seed = device.new_buffer_with_data(
        [299792458].as_ptr() as *const std::ffi::c_void,
        4,
        metal::MTLResourceOptions::StorageModeManaged, // <-- 0x10
    );

    println!("seed: {:?}", &seed);
}

Which prints and then panics:

device: <CaptureMTLDevice: 0x105207820> -> <MTLDebugDevice: 0x105207580> -> <AGXG16Device: 0x10800f400>
    name = Apple A17 Pro GPU
MTLResourceOptions::StorageModeManaged = 0x10
-[MTLDebugDevice newBufferWithBytes:length:options:]:670: failed assertion `Buffer Validation
Illegal MTLStorageMode 0x10
'

Other values such as StorageModeShared do not panic, but i'm not sure of the implications also because StorageModeManaged is used in several other places in the metal implementation.

@evilsocket
Copy link
Author

also mentioned in #1841

evilsocket added a commit to evilsocket/cake that referenced this issue Jul 8, 2024
@evilsocket
Copy link
Author

After a bit of digging into Apple MLX and especially how they handle buffer allocation on both macOS and iOS, I found this https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/allocator.cpp#L207

You will see that all allocations are centralized there and they always and only use ResourceStorageModeShared (you won't find references to other storage modes in their metal backend). So it seems like on iOS at least, managed buffers are not needed. It makes sense if we think about it as Metal on macOS must support both Intel (where GPU and RAM are not unified) and Apple Silicon, while Metal on iOS only cares about unified memory, hence no syncing needed / supported.

Unless I've got it all wrong, which is possible.

evilsocket added a commit to evilsocket/candle that referenced this issue Jul 10, 2024
@evilsocket
Copy link
Author

Also related, only the precompiled macOS library is present in this repo #1759

working on it on my fork

@evilsocket
Copy link
Author

My branch compiles and runs on iOS:

Screenshot 2024-07-10 alle 16 08 46

I'm still getting weird errors when I try to have my macOS M1 doing computations with an iPhone (Error: A weight is invalid in distribution), but it works.

@evilsocket
Copy link
Author

Fixes happening here: https://github.com/evilsocket/candle
Integration happening here: https://github.com/evilsocket/llama3-cake

@LaurentMazare
Copy link
Collaborator

"weight is invalid in distribution" usually means that the model generated a nan, so somehow one of the metal kernel probably did not work in the same way on iphone as it would have on macos. It might be good to print the tensors after each op to get a sense of where the nan is coming from.

@evilsocket
Copy link
Author

I have a feeling it comes from this https://codebrowser.dev/tokio/crates/rand-0.8.5/src/distributions/weighted_index.rs.html#454 .. i'm trying to narrow it down but to be honest I have very little experience with Candle so it's taking a while ...

@LaurentMazare
Copy link
Collaborator

Yes it's coming from the sampling, and the input of the sampling is based on the logits generated on the model, I would think that it's almost always caused by one of these values being a nan (or an inf).

@evilsocket
Copy link
Author

tensor printf debugging ftw

@evilsocket
Copy link
Author

@LaurentMazare you were 100% right, the error is in the logits output vector being full of NaN, the sampling doesn't like that ... disabling the kv_cache on both workers and master seem to be improve things, the point being, this is a problem of my tool, the porting of candle by itself seems to work

@evilsocket
Copy link
Author

@LaurentMazare it seems to depend on some (version?) discrepancy between the libMetalFlashAttention.metallib bundled in this repo vs the iOS and macOS libMetalFlashAttention.metallib bundled here https://github.com/philipturner/metal-flash-attention/releases/tag/v1.0.1

In this issue #1759 @ivarflakstad suggests to use the iOS version from that release - however I found out that if I replace the the macOS metallib in this repo with that release, these tests fail https://github.com/evilsocket/candle-metal-tester/blob/main/core/src/lib.rs (even on macOS), suggesting that candle-metal-kernels makes some assumptions on the implementation of that library, that clearly changed over time.

Bottom line, it would be great if the last person to update the lib on this repo ( @FL33TW00D via b6afb46 ) could document which specific version/commit that library was compiled from and if possible also compile the same version for iOS (or point to the right precompiled release). I suspect that aligning both macOS and iOS libMetalFlashAttention versions would solve the issue.

@ivarflakstad
Copy link
Member

We're currently using @FL33TW00D 's fork of mfa which iirc has a tiny change in how buffers are accessed.

@ivarflakstad
Copy link
Member

My plan was actually to fix all these issues as soon as I'm back from vacation.
No need to use xcode 14. No longer be fully dependant on mfa. Introduce candle.metallib.

@evilsocket
Copy link
Author

@ivarflakstad understood, i'll try to compile this for iOS https://github.com/FL33TW00D/metal-flash-attention and wait for proper fix, thanks

@evilsocket
Copy link
Author

@ivarflakstad just tried to compile @FL33TW00D fork (last commit), that is not the one:

found <AGXG13XDevice: 0x13300ca00>
    name = Apple M1 Max

using <AGXG13XDevice: 0x13300ca00>
    name = Apple M1 Max
thread 'main' panicked at core/src/lib.rs:80:5:
assertion `left == right` failed
  left: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 right: [20.0, 23.0, 26.0, 29.0, 56.0, 68.0, 80.0, 92.0]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@evilsocket
Copy link
Author

I've literally tried to compile all branches of that repo and all of them fail the tests

@evilsocket
Copy link
Author

@ivarflakstad i became a little bit obsessed with this so I tried to compile and test every single commit of every branch of both forks of libMetalFlashAttention.metallib, both for macOS and iOS. My results are a bit discouraging:

macOS:

  • 0.1.0 (precompiled from main repo): fails tests and hangs bad the gpu state
  • 0.2.0 (precompiled from main repo): fails tests and hangs bad the gpu state
  • 1.0.0 (precompiled from main repo): fails tests and hangs bad the gpu state
  • 1.0.1 (precompiled from main repo): fails tests
  • batches-2 (branch from FL33TW00D's fork): fails tests
  • larger-batches: (branch from FL33TW00D's fork): fails tests
  • original bundled binary: passes all tests, indicating this version comes from somewhere else.

iOS:

  • 0.1.0 (precompiled from main repo): fails and errors called Result::unwrap() on an Err value: LoadFunctionError("Constant K_splits (212) value is required by function sgemm")
  • 0.2.0 (precompiled from main repo): fails and errors called Result::unwrap() on an Err value: LoadFunctionError("Constant K_splits (212) value is required by function sgemm")
  • 1.0.0 (precompiled from main repo): fails and errors called Result::unwrap() on an Err value: LoadFunctionError("Constant K_splits (212) value is required by function sgemm")
  • 1.0.1 (precompiled from main repo): fails tests
  • batches-2 (branch from FL33TW00D's fork): fails tests
  • larger-batches: (branch from FL33TW00D's fork): fails tests
  • original bundled binary: does not exist :/

It's so frustrating because this single library is the only obstacle to iOS compatibility.

@ivarflakstad
Copy link
Member

Right. I tried to nudge you away from the rabbit hole, but I get the curiosity hehe.
We won’t be dependent on the precompiled MFA library for much longer.

@evilsocket
Copy link
Author

@ivarflakstad i am so looking forward to it - i don't think i have to tell you guys that there's a lot of potential in unlocking iOS. To my knowledge, Candle is currently the closest thing to a single framework supporting pretty much anything in a single codebase.

@sinkingsugar
Copy link

So this is still broken?

-[MTLDebugDevice newBufferWithBytes:length:options:]:723: failed assertion `Buffer Validation
Illegal MTLStorageMode 0x10

on 0.8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants