Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enbaled setting selected_rows in the runtime. #205

Merged
merged 1 commit into from
Dec 4, 2022

Conversation

RinChanNOWWW
Copy link
Contributor

@RinChanNOWWW RinChanNOWWW commented Nov 30, 2022

It's useful to let user change the selected_rows during iteration.

For example:

// Prefetch one column and apply predicates to it to get a bitmap.
let bitmap = pre_fetch_and_filter(pre);
let bitmap = Arc::new(Mutex::new(bitmap));
// Use this bitmap to iterate the remaining column(s) and select rows;
let pages = PageReader::new_with_page_meta(
        reader,
        reader_meta,
        pages_filter,
        scratch,
        max_header_size,
    )
    .map(move |page| {
        page.map(|page| {
            page.select_rows(use_bitmap(bitmap));
        })
    });

let array_iter = column_iter_to_arrays(pages, ...);
// ...

@RinChanNOWWW
Copy link
Contributor Author

RinChanNOWWW commented Nov 30, 2022

And I have a question that how can we select rows of a nested type? For example: Struct.

When I add selected_rows to Struct array, I meet such problem:

Decoding Int32 "Plain"-encoded required , index-filtered parquet pages.

I can guarantee all columns in the Struct is the same length in my use case.

cc @jorgecarleitao

This happens because it not allows selected_rows in nested type.

// Nested Decoder
fn build_state(
      &self,
      page: &'a DataPage,
      dict: Option<&'a Self::Dictionary>,
  ) -> Result<Self::State> {
      let is_optional =
          page.descriptor.primitive_type.field_info.repetition == Repetition::Optional;
      let is_filtered = page.selected_rows().is_some();

      match (page.encoding(), dict, is_optional, is_filtered) {
          (Encoding::PlainDictionary | Encoding::RleDictionary, Some(dict), false, false) => {
              ValuesDictionary::try_new(page, dict).map(State::RequiredDictionary)
          }
          (Encoding::PlainDictionary | Encoding::RleDictionary, Some(dict), true, false) => {
              ValuesDictionary::try_new(page, dict).map(State::OptionalDictionary)
          }
          (Encoding::Plain, _, true, false) => Values::try_new::<P>(page).map(State::Optional),
          (Encoding::Plain, _, false, false) => Values::try_new::<P>(page).map(State::Required),
          _ => Err(utils::not_implemented(page)),
      }
  }

How about enabling selected_rows in nested type and assert the length of every columns to be the same in the StructIterator?

@jorgecarleitao
Copy link
Owner

Ohh, that is correct - yes, we should add support for that in nested types also.

@codecov-commenter
Copy link

Codecov Report

Base: 85.12% // Head: 85.09% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (c35aecd) compared to base (06f0675).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #205      +/-   ##
==========================================
- Coverage   85.12%   85.09%   -0.04%     
==========================================
  Files          86       86              
  Lines        8289     8292       +3     
==========================================
  Hits         7056     7056              
- Misses       1233     1236       +3     
Impacted Files Coverage Δ
src/page/mod.rs 74.24% <0.00%> (-0.86%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jorgecarleitao jorgecarleitao added the enhancement New feature or request label Dec 4, 2022
@jorgecarleitao jorgecarleitao merged commit fb08b72 into jorgecarleitao:main Dec 4, 2022
@jorgecarleitao jorgecarleitao changed the title Enbale setting selected_rows in the runtime. Enbaled setting selected_rows in the runtime. Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants