-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Arc with Box in ArrowArray for FFI structs #1432
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1432 +/- ##
==========================================
- Coverage 82.67% 82.67% -0.01%
==========================================
Files 185 185
Lines 53822 53820 -2
==========================================
- Hits 44500 44498 -2
Misses 9322 9322
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort, the change looks good to me. Only one comment in the code.
let data = ArrayData::try_from(array)?; | ||
let data = ArrayData::try_from(&array)?; | ||
// Avoid dropping the `Box` pointers and trigger the `release` mechanism. | ||
let _ = ffi::ArrowArray::into_raw(array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to change ArrayData::try_from() to have &? This line seems very tricky.
And as we change the ArrayData::try_from() API, all other users may also need to change the code and also add this tricky line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ArrayData::try_from()
moves ffi::ArrowArray
, it will drop the ArrowArray
and trigger release
for the structs (as they are just Box
pointers now). Users cannot prevent it happened. For example, without this change, our internal usecase got a SIGSEGV.
So I change it to a borrowed reference to avoid dropping/releasing there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
Arc::from(Box::from_raw(array as *mut FFI_ArrowArray))
is Arc<Box<FFI_ArrowArray>>
. Then the raw pointer is *Box<FFI_ArrowArray>
, but you treat it as *FFI_ArrowArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm wrong. I'm not aware that there is from(v: Box<T>)
API in Arc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously you don't need it. As Arc
is kept in the created Buffer of Array data, you can rely on deallocation of the Buffer to call release
of such ffi structs.
But Box
cannot give us such benefit. So it makes the management more explicit and relying on users. We need to keep these structs so release
won't be called, before we don't need the Array data (Buffer).
The code is at ArrowArrayRef.to_data
. It is to create an ArrayData
from an ArrowArray(Ref)
. And you can follow buffers
-> create_buffer
-> Buffer::from_unowned
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation.
I'm wondering if we can still use Arc
but drop the "envelop memory" allocated for the struct holding the actual pointers, for the input array and schema. For example:
pub unsafe fn try_from_raw(
array: *const FFI_ArrowArray,
schema: *const FFI_ArrowSchema,
) -> Result<Self> {
if array.is_null() || schema.is_null() {
return Err(ArrowError::MemoryError(
"At least one of the pointers passed to `try_from_raw` is null"
.to_string(),
));
};
let array_mut = array as *mut FFI_ArrowArray;
let schema_mut = schema as *mut FFI_ArrowSchema;
let array_data = std::ptr::replace(array_mut, FFI_ArrowArray::empty());
let schema_data = std::ptr::replace(schema_mut, FFI_ArrowSchema::empty());
std::ptr::drop_in_place(array_mut);
std::ptr::drop_in_place(schema_mut);
Ok(Self {
array: Arc::new(array_data),
schema: Arc::new(schema_data),
})
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks close to previous suggestion Arc::from
as it also copies bytes from source structs. But it causes SIGSEGV. I will try to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried it. It seems okay. Arc::from
previously not work, I think, is because it calls allocator to deallocate the memory allocation. As it is allocated by Java in our case, we cannot let Rust to deallocate it.
std::ptr::drop_in_place
seems only trigger dropping. As we make it as empty structs, it won't trigger release
. I think this is close to #1436 which cleans up release
field of source structs after cloning it. Here we in fact still clone it, but just internally and don't expose clone
.
Looks good to me. Thanks @sunchao .
cc @alamb @wangfenjin WDYT? Are you agreed with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested approach is at #1449.
Note that because the struct pointers are now That's why I personally prefer alternative |
I think the "be explicit about memory management" is the typical Rust mantra in this case, so I would prefer the Perhaps in your use of the FFI structs, you could use |
#1442 tracks the patch releases |
@alamb Thanks. It makes sense. Then I think the |
This can be closed now. |
Which issue does this PR close?
Closes #1425.
Rationale for this change
What changes are included in this PR?
Clone
fromFFI_ArrowArray
andFFI_ArrowSchema
array
andschema
inArrowArray
fromArc
toBox
typeAre there any user-facing changes?
TryFrom<ffi::ArrowArray> for ArrayData
is changed toTryFrom<&ffi::ArrowArray> for ArrayData
.is changed to
fn owner(&self) -> &Arc<FFI_ArrowArray>;
intrait ArrowArrayRef
is changed tofn owner(&self) -> Arc<&FFI_ArrowArray>;
.