Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
Thanks so much for checking out this repository, and this PR.
I don't have much experience with Rust (only a few small apps here and there to learn the basics), so this has been a fantastic learning experience.
Please see this discussion for a rundown about my initial thoughts, and a fantastic response from @jorgecarleitao.
This Pull Request adds support for converting Arrow IPC Stream files that Snowflake sends in a response to their Rest API using Rustler within Elixir. This allows req_snowflake to use this as an optional dependency to decode Arrow alongside the JSON files Snowflake sends.
Approach
The approach I went with initially was around serialising the Arrow types to Rust types, which could then be serialised to Elixir types easily. I'm not sure how memcpy would work in this case, or a good way to do this in a zero-copy way. Right now I feel like Arrow2 is parsing it, then we're reiterating over it to encode it. And I have a feeling Rustler might be also encoding the data?
Feedback
I would love feedback around the following areas:
1/ What is the best way to return Rust types back to Elixir types (and cast them as an option, so we could return a Snowflake date as an Elixir Date, without having to do it via Elixir).
2/ Is the way I'm doing the encoding good? I think I could just move the encoder into the main function then return a Vec of Terms, right? It feels a bit like pass the parcel, where it sends data to be encoded (which then encodes it in another function).
3/ Would it be worth while using something like multhreading (rayon or whatever), I'm guessing not unless it's a huge amount of data, which Snowflake will only send us <20mb from what I can see.
I've added tests to check that data is converting correctly and the NIF works, so there will be a lot of extra files, sorry.
Main files:
lib.rs:
https://github.com/joshuataylor/snowflake_arrow/pull/1/files#diff-a329bc6502ab443b8ba9f5ce11b9d2cef36f441771d5d11437574d6e9b1c58a4
Here are some initial benchmarking results, these aren't casting into Elixir structs etc yet:
Laptop, Apple Macbook m1
Desktop (slowish single core performance)
Desktop, AMD Ryzen 5 5600X