Initial Snowflake Arrow prototype #1

joshuataylor · 2022-06-04T13:02:06Z

Hi!

Thanks so much for checking out this repository, and this PR.

I don't have much experience with Rust (only a few small apps here and there to learn the basics), so this has been a fantastic learning experience.

Please see this discussion for a rundown about my initial thoughts, and a fantastic response from @jorgecarleitao.

This Pull Request adds support for converting Arrow IPC Stream files that Snowflake sends in a response to their Rest API using Rustler within Elixir. This allows req_snowflake to use this as an optional dependency to decode Arrow alongside the JSON files Snowflake sends.

Approach

The approach I went with initially was around serialising the Arrow types to Rust types, which could then be serialised to Elixir types easily. I'm not sure how memcpy would work in this case, or a good way to do this in a zero-copy way. Right now I feel like Arrow2 is parsing it, then we're reiterating over it to encode it. And I have a feeling Rustler might be also encoding the data?

Feedback

I would love feedback around the following areas:

1/ What is the best way to return Rust types back to Elixir types (and cast them as an option, so we could return a Snowflake date as an Elixir Date, without having to do it via Elixir).
2/ Is the way I'm doing the encoding good? I think I could just move the encoder into the main function then return a Vec of Terms, right? It feels a bit like pass the parcel, where it sends data to be encoded (which then encodes it in another function).
3/ Would it be worth while using something like multhreading (rayon or whatever), I'm guessing not unless it's a huge amount of data, which Snowflake will only send us <20mb from what I can see.

I've added tests to check that data is converting correctly and the NIF works, so there will be a lot of extra files, sorry.

Main files:
lib.rs:

https://github.com/joshuataylor/snowflake_arrow/pull/1/files#diff-a329bc6502ab443b8ba9f5ce11b9d2cef36f441771d5d11437574d6e9b1c58a4

Here are some initial benchmarking results, these aren't casting into Elixir structs etc yet:

Laptop, Apple Macbook m1

CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.13.4
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s

Benchmarking arrow_large (9.4mb) ...
Benchmarking arrow_small (368kb) ...

Name                          ips        average  deviation         median         99th %
arrow_small (368kb)        1.15 K        0.87 ms    ±21.48%        0.79 ms        1.20 ms
arrow_large (9.4mb)       0.148 K        6.76 ms    ±25.01%        5.91 ms        9.32 ms

Comparison:
arrow_small (368kb)        1.15 K
arrow_large (9.4mb)       0.148 K - 7.77x slower +5.89 ms

Desktop (slowish single core performance)

Operating System: Linux
CPU Information: AMD Ryzen Threadripper 2990WX 32-Core Processor
Number of Available Cores: 64
Available memory: 125.82 GB
Elixir 1.13.4
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s

Benchmarking arrow_large (9.4mb) ...
Benchmarking arrow_small (368kb) ...

Name                          ips        average  deviation         median         99th %
arrow_small (368kb)        488.32        2.05 ms    ±44.27%        1.45 ms        3.80 ms
arrow_large (9.4mb)         56.52       17.69 ms     ±9.90%       17.45 ms       24.26 ms

Comparison: 
arrow_small (368kb)        488.32
arrow_large (9.4mb)         56.52 - 8.64x slower +15.65 ms

Desktop, AMD Ryzen 5 5600X

Operating System: Linux
CPU Information: AMD Ryzen 5 5600X 6-Core Processor
Number of Available Cores: 12
Available memory: 31.33 GB
Elixir 1.13.4
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s

Benchmarking arrow_large (9.4mb) ...
Benchmarking arrow_small (368kb) ...

Name                          ips        average  deviation         median         99th %
arrow_small (368kb)        1.08 K        0.92 ms    ±22.65%        0.83 ms        1.28 ms
arrow_large (9.4mb)       0.121 K        8.25 ms    ±28.22%        7.47 ms       11.54 ms

Comparison: 
arrow_small (368kb)        1.08 K
arrow_large (9.4mb)       0.121 K - 8.94x slower +7.33 ms

Create initial PR

bfe7d89

joshuataylor force-pushed the initial branch from 5043b5f to bfe7d89 Compare June 4, 2022 13:03

joshuataylor added 5 commits June 4, 2022 21:04

remove crud

d39d71f

fix benchmark

374c2f5

rework to be a bit cleaner

326d94d

benchmark data for json/arrow

6550101

fix benchmark

4a7aca8

joshuataylor closed this Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Snowflake Arrow prototype #1

Initial Snowflake Arrow prototype #1

joshuataylor commented Jun 4, 2022 •

edited

Loading

Initial Snowflake Arrow prototype #1

Initial Snowflake Arrow prototype #1

Conversation

joshuataylor commented Jun 4, 2022 • edited Loading

Approach

Feedback

joshuataylor commented Jun 4, 2022 •

edited

Loading