Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization: concatenating fixed-size arrays #70439

Closed
moxian opened this issue Mar 26, 2020 · 2 comments
Closed

Missed optimization: concatenating fixed-size arrays #70439

moxian opened this issue Mar 26, 2020 · 2 comments
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@moxian
Copy link
Contributor

moxian commented Mar 26, 2020

The following three functions are the same semantically - they pack two u64s into 16 bytes in big-endian order, but produce different assembly:

// Naive, and, I assume, idiomatic implementation. Bad assembly with several needless moves.
pub fn test(tup: (u64, u64)) -> [u8; 16] {
    let first = (tup.0).to_be_bytes();
    let second = (tup.1).to_be_bytes();
    let mut key = [0u8; 16];
    key[..8].copy_from_slice(&first[..]);
    key[8..].copy_from_slice(&second[..]);
    key
}

// Unsafe implementation. Good assembly .
pub fn test2(tup: (u64, u64)) -> [u8; 16] {
    let first: [u8; 8] = (tup.0).to_be_bytes();
    let second: [u8; 8] = (tup.1).to_be_bytes();
    let packed: [[u8;8]; 2] = [first, second];
    let key = unsafe { std::mem::transmute(packed) } ;
    key
}

// No unsafe, good assembly .
pub fn test3(tup: (u64, u64)) -> [u8; 16] {
    let first = (tup.0).to_be_bytes();
    let second = (tup.1).to_be_bytes();
    let mut key = [0u8; 16];
    let (f, s) = key.split_at_mut(8);
    f.copy_from_slice(&first[..]);
    s.copy_from_slice(&second[..]);
    key
}

Credits to @memoryruins for coming up with test3.

Godbolt link: https://godbolt.org/z/4QaBp4

It is surprising that naive implementation (test1) produces worse assembly than the other two. I was told I should report this here as a bug.

Meta

rustc 1.42.0 (as reported by godbolt)

@moxian moxian added the C-bug Category: This is a bug. label Mar 26, 2020
@jonas-schievink jonas-schievink added A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels Mar 26, 2020
@moxian
Copy link
Contributor Author

moxian commented Mar 26, 2020

This might be related to #62446 and/or #67484

@moxian
Copy link
Contributor Author

moxian commented Feb 9, 2021

This appears to have been fixed between 1.44.0 and 1.45.0.
The assembly for all three functions in OP are now identical

example::test:
        mov     rax, rdi
        bswap   rsi
        bswap   rdx
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 8], rdx
        ret

I think this can be closed.

I've checked #62446 and #67484, still have bad assembly, so probably different root causes.

@moxian moxian closed this as completed Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants