-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16430: [Python] Add support for reading record batch custom metadata API #13041
Conversation
|
1a7e908
to
620ff15
Compare
34b556a
to
38e45dd
Compare
38e45dd
to
c1fb6ad
Compare
python/pyarrow/ipc.pxi
Outdated
batch : RecordBatch | ||
custom_metadata: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Is this the right Numpydoc syntax when a namedtuple is returned?
By the way, this PR addresses |
c1fb6ad
to
78f492f
Compare
I tried adding the pyarrow API for
|
78f492f
to
3f040a0
Compare
@pitrou I made some additional changes to the PR, could you please help to review? Thanks. |
@niyue Sorry for the late replay, but the C++ |
@niyue Are you planning to work on this? |
I can certainly pick it up if needed! |
Hi @pitrou, sorry I got too busy since last commit and forgot this issue previously. I am still interested working on this, and can start working on it from next week. |
3f040a0
to
a676680
Compare
@niyue If/when this is ready for review, please say so :-) |
@pitrou I pushed a new commit a minute ago. I followed your suggestion to add corresponding API for |
@pitrou I am not sure if you see my previous comment. I made some change previously, and this PR is ready for review, could you please help? Thanks. |
@niyue Sorry, I had overlooked it. I'll take a look when I can. @jorisvandenbossche would you like to review this too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! Here are some more comments.
No problem. I will check them out and see if I can get them addressed. Thanks for the review. |
6498958
to
ef95d59
Compare
There were some timeouts in some CI jobs. I've restarted them just in case. If that still reproduces, should first rebase from master to catch up with any upstream fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just two minor doc formatting comments
This looks like a nice addition. Something I am wondering more in general (potentially for future JIRA/PRs), when working with custom metadata, would it be useful to also allow to inspect the custom metadata of a certain batch, without also loading the batch? (so you could for example check the metadata before deciding whether to read the batch or not). |
Perhaps, but that would have to be done on the C++ side first. Also, I would wait for users to actually request it. |
ef95d59
to
af3cd34
Compare
@pitrou sorry for the late response. There are two failed CI builds, and they don't seem relevant in this issue. I rebased onto the latest master branch anyway. |
…w so that pyarrow can read record batch along with its custom metadata.
af3cd34
to
8f5d119
Compare
@pitrou could you please help to review the PR to see if there is still anything we desire to change? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really sorry for the delay @niyue ! I've pushed a couple minor changes and will merge if CI is green.
Thank you for the work!
No problem @pitrou . Thanks for the review and the fixes. |
In ARROW-16131, C++ APIs were added so that users can read/write record batch custom metadata for IPC file. In this PR, pyarrow APIs are added so that python users can take advantage of these APIs to address ARROW-16430.