Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lance filtering not working #96

Closed
Tracked by #97
changhiskhan opened this issue Aug 11, 2022 · 0 comments · Fixed by #104
Closed
Tracked by #97

Lance filtering not working #96

changhiskhan opened this issue Aug 11, 2022 · 0 comments · Fixed by #104
Assignees
Labels
arrow Apache Arrow related issues bug Something isn't working c++ C++ issues

Comments

@changhiskhan
Copy link
Contributor

changhiskhan commented Aug 11, 2022

Use case: build a scanner that filters on annotations.name using lance.scanner
We cannot do that directly because #60
So as a workaround, I'm filtering for the image_ids using unnest in then setting up a scanner with the filtered ids.

However, the filter does not work on Lance but does for parquet.

Repo:

Works in parquet:

import pyarrow as pa
from pyarrow.dataset import dataset

ids = [391895, 522418, 184613, 318219, 554625, 574769, 60623, 309022, 5802, 222564]
ds = dataset('s3://eto-public/datasets/coco/coco_links.parquet')
tbl = ds.to_table(filter=pc.field('image_id').isin(ids))

Fails with Lance:

import lance
ids = [391895, 522418, 184613, 318219, 554625, 574769, 60623, 309022, 5802, 222564]
ds = lance.dataset('s3://eto-public/datasets/coco/coco_links.lance')
tbl = ds.to_table(filter=pc.field('image_id').isin(ids))

With error message:

---------------------------------------------------------------------------
ArrowIndexError                           Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 tbl = ds.to_table(filter=pc.field('image_id').isin(ids))

File ~/code/eto/lance/python/thirdparty/arrow/python/pyarrow/_dataset.pyx:331, in pyarrow._dataset.Dataset.to_table()

File ~/code/eto/lance/python/thirdparty/arrow/python/pyarrow/_dataset.pyx:2577, in pyarrow._dataset.Scanner.to_table()

File ~/code/eto/lance/python/thirdparty/arrow/python/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File ~/code/eto/lance/python/thirdparty/arrow/python/pyarrow/error.pxi:127, in pyarrow.lib.check_status()

ArrowIndexError: Index 9 out of bounds
@eddyxu eddyxu self-assigned this Aug 13, 2022
@eddyxu eddyxu added bug Something isn't working c++ C++ issues arrow Apache Arrow related issues labels Aug 13, 2022
@changhiskhan changhiskhan changed the title scanner filter doesn't seem to work Lance filtering not working Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Apache Arrow related issues bug Something isn't working c++ C++ issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants