Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Considering migrate to portable-simd #580

Closed
sundy-li opened this issue Nov 6, 2021 · 11 comments · Fixed by #747
Closed

Considering migrate to portable-simd #580

sundy-li opened this issue Nov 6, 2021 · 11 comments · Fixed by #747
Labels
investigation Issues or PRs that are investigations. Prs may or may not be merged. no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@sundy-li
Copy link
Collaborator

sundy-li commented Nov 6, 2021

Seems packed_simd2 is not developed.

https://github.com/rust-lang/portable-simd/

@jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao added investigation Issues or PRs that are investigations. Prs may or may not be merged. help wanted Extra attention is needed labels Nov 10, 2021
@jorgecarleitao
Copy link
Owner

IMO this feels like a super cool issue. If anyone would like to pick this one, go ahead.

@jorgecarleitao
Copy link
Owner

fwiw, quick perf benchmark shows benefits on the sum of non-nulls:

core_simd_sum 2^20 f32     time:   [174.18 us 174.37 us 174.59 us]
packed_simd_sum 2^20 f32   time:   [183.91 us 184.10 us 184.33 us]
nonsimd_sum 2^20 f32       time:   [193.12 us 194.76 us 197.03 us]
naive_sum null 2^20 f32    time:   [1.6468 ms 1.6513 ms 1.6555 ms]

with

#![feature(portable_simd)]

use criterion::{criterion_group, criterion_main, Criterion};

use std::convert::TryInto;

use core_simd::f32x16;
use packed_simd::f32x16 as p_f32x16;

const LANES: usize = 16;

pub fn packed_simd_sum(values: &[f32]) -> f32 {
    let chunks = values.chunks_exact(LANES);
    let remainder = chunks.remainder();

    let sum = chunks.fold(p_f32x16::default(), |acc, chunk| {
        let chunk: [f32; 16] = chunk.try_into().unwrap();
        let chunk: p_f32x16 = p_f32x16::from_slice_unaligned(&chunk);

        acc + chunk
    });

    let remainder: f32 = remainder.iter().copied().sum();

    sum.sum() + remainder
}

pub fn core_simd_sum(values: &[f32]) -> f32 {
    let chunks = values.chunks_exact(LANES);
    let remainder = chunks.remainder();

    let sum = chunks.fold(f32x16::default(), |acc, chunk| {
        let chunk: [f32; 16] = chunk.try_into().unwrap();
        let chunk: f32x16 = f32x16::from_array(chunk);

        acc + chunk
    });

    let remainder: f32 = remainder.iter().copied().sum();

    let mut reduced = 0.0f32;
    for i in 0..LANES {
        reduced += sum[i];
    }
    reduced + remainder
}

pub fn nonsimd_sum(values: &[f32]) -> f32 {
    let chunks = values.chunks_exact(LANES);
    let remainder = chunks.remainder();

    let sum = chunks.fold([0.0f32; LANES], |mut acc, chunk| {
        let chunk: [f32; LANES] = chunk.try_into().unwrap();
        for i in 0..LANES {
            acc[i] += chunk[i];
        }
        acc
    });

    let remainder: f32 = remainder.iter().copied().sum();

    let mut reduced = 0.0f32;
    (0..LANES).for_each(|i| {
        reduced += sum[i];
    });
    reduced + remainder
}

pub fn naive_sum(values: &[f32]) -> f32 {
    values.iter().sum()
}

fn add_benchmark(c: &mut Criterion) {
    (10..=20).step_by(2).for_each(|log2_size| {
        let size = 2usize.pow(log2_size);
        let array = (0..size)
            .map(|x| std::f32::consts::PI * x as f32 * x as f32 - std::f32::consts::PI * x as f32)
            .collect::<Vec<_>>();

        c.bench_function(&format!("core_simd_sum 2^{} f32", log2_size), |b| {
            b.iter(|| core_simd_sum(&array))
        });
        c.bench_function(&format!("packed_simd_sum 2^{} f32", log2_size), |b| {
            b.iter(|| packed_simd_sum(&array))
        });
        c.bench_function(&format!("nonsimd_sum 2^{} f32", log2_size), |b| {
            b.iter(|| nonsimd_sum(&array))
        });
        c.bench_function(&format!("naive_sum null 2^{} f32", log2_size), |b| {
            b.iter(|| naive_sum(&array))
        });
    });
}

criterion_group!(benches, add_benchmark);
criterion_main!(benches);

and

[package]
name = "test"
version = "0.1.0"
edition = "2018"

[dependencies]
core_simd = { git = "https://github.com/rust-lang/portable-simd" }
packed_simd = { version = "0.3", package = "packed_simd_2" }

[dev-dependencies]
criterion = "0.3"

[[bench]]
name = "sum"
harness = false

@jorgecarleitao
Copy link
Owner

See also https://github.com/DataEngineeringLabs/simd-benches, where I am benchmarking the algorithms.

@Dandandan
Copy link
Collaborator

A cool thing is support for gather operations, which could speed up take.
Reference:
https://rust-lang.github.io/portable-simd/core_simd/simd/struct.Simd.html#method.gather_or

@jorgecarleitao
Copy link
Owner

Waiting for rust-lang/portable-simd#197

@Igosuki
Copy link
Contributor

Igosuki commented Jan 12, 2022

Nota bene, this prevents from compiling datafusion on stable with simd enabled when using arrow2.
Edit : scratch that, as simd is only available on nightly to begin with.

@ritchie46
Copy link
Collaborator

Nota bene, this prevents from compiling datafusion on stable with simd enabled when using arrow2.

Maybe we can have two simd implementations separated by feature flags?

@Igosuki
Copy link
Contributor

Igosuki commented Jan 12, 2022

Features bound to which rustc make things very clunky... I think we'll have to limit datafusion on arrow2 to rust nightly for now.

@jorgecarleitao
Copy link
Owner

does datafusion compile on stable with simd enabled? - I think it depends on arrow, which depends on packed_simd, which requires nighty (but it has been a while)

My understanding is that currently simd in our whole stack (arrow, arrow2, datafusion, polars, databend, etc) is only available on nightly. AFAI understand this is one of the issues the simd working group is addressing with the std::simd - make simd available on stable.

@Igosuki
Copy link
Contributor

Igosuki commented Jan 12, 2022

My bad, it is in fact only available on nightly.

@jorgecarleitao jorgecarleitao added no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog and removed help wanted Extra attention is needed labels Mar 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
investigation Issues or PRs that are investigations. Prs may or may not be merged. no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants