Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Add page_size opt to control array_to_pages #1291

Closed
sundy-li opened this issue Nov 4, 2022 · 0 comments · Fixed by #1303
Closed

Add page_size opt to control array_to_pages #1291

sundy-li opened this issue Nov 4, 2022 · 0 comments · Fixed by #1303
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@sundy-li
Copy link
Collaborator

sundy-li commented Nov 4, 2022

seems the 2^31 - 2^25 is a fixed value now, which may be to large.

 // maximum page size is 2^31 e.g. i32::MAX
    // we split at 2^31 - 2^25 to err on the safe side
    // we also check for an array.len > 3 to prevent infinite recursion
    // still have to figure out how to deal with values that are i32::MAX size, such as very large
    // strings or a list column with many elements
    if (estimated_bytes_size(array)) >= (2u32.pow(31) - 2u32.pow(25)) as usize && array.len() > 3 {
        let split_at = array.len() / 2;
        let left = array.slice(0, split_at);
        let right = array.slice(split_at, array.len() - split_at);

        Ok(DynIter::new(
            array_to_pages(&*left, type_.clone(), nested, options, encoding)?
                .chain(array_to_pages(&*right, type_, nested, options, encoding)?),
        ))
    }

parquet.page.size: The page size is for compression. When reading, each page can be decompressed independently. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. Default size is 1048576 bytes (= 1 * 1024 * 1024).

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants