-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC Extensions #28368
GC Extensions #28368
Conversation
src/gc.c
Outdated
@@ -2286,7 +2328,7 @@ mark: { | |||
obj16 = (gc_mark_obj16_t*)sp.data; | |||
goto obj16_loaded; | |||
} | |||
else { | |||
else if (layout->fielddesc_type == 2) { | |||
// This is very uncommon | |||
// Do not do store to load forwarding to save some code size | |||
assert(layout->fielddesc_type == 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this before, but I guess this assert
could now be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbehrends: please address this, too?
src/gc-pages.c
Outdated
} | ||
support_conservative_scans = 1; | ||
JL_UNLOCK_NOGC(&conservative_scan_lock); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that all this function does in the end (at least right now) is to enable the support_conservative_scans
and jl_gc_cleanup_pages
flags. So perhaps the Julia team would prefer a function name that better reflects that, e.g. jl_gc_enable_clean_pages
. I think the main reason we went with the current name is that hypothetically, it could do additional work in the future related to conservative scanning. Anyway, in the end, as log as get this functionality in some way, we will of course adapt to whichever names you might prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK in general except for the second last commit. I don't think there's a way around hooking into implementation details (at least in the sense for every other pieces of code) but the coupling here is a little too much and too unstable.
I also don't think you need any of these since you can get all these info from the page metadata.
During marking, when you get a pointer pointing to the pool pages,
- Find the cell boundary
- If the cell is marked, then you're done
- If
nfree
is0
, there's either no free cell in the page, in which case the cell you've just found is a valid object, or the page is being allocated. If the page is being allocated, you can tell if the cell is valid or not by comparing its address to the current allocation point on the thread the page belongs to. - Now the only two possibilities are free cell or a young object on a page that has not been touched since last sweep. In this case, you can use the age to tell the difference. The only issue is the reset age done in the finalizer marking so the only change you'll need to make is to disable the age reset there.
src/gc.c
Outdated
static jl_gc_callback_list_t *gc_cblist_pre_gc; | ||
static jl_gc_callback_list_t *gc_cblist_post_gc; | ||
|
||
#define gc_invoke_callbacks(kind, args) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro symbol concatenation is still here.....
src/gc.c
Outdated
desc->sweepfunc(v); | ||
} | ||
else { | ||
objs->items[p++] = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent.
src/gc-pages.c
Outdated
@@ -21,6 +21,19 @@ extern "C" { | |||
|
|||
static int block_pg_cnt = DEFAULT_BLOCK_PG_ALLOC; | |||
static size_t current_pg_count = 0; | |||
static int support_conservative_scans = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this commit is the kind of change that will be very hard to maintain... It relies on a lot of the internal implementation details that can otherwise be changed without any issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I can now get rid of this part entirely by implementing your other suggestion for finding references to pool allocations, as that would (I think) eliminate the need to zero pages entirely. I'll put the code in and get back to you ASAP with an update (including also the other requested changes).
The changes turned out to be a bit more intricate than I had expected. They are all contained in a single new commit (for now) instead of integrating them with the existing commits to make review of the additional changes easier. The "no free objects on page" case must actually be split up; if a page is being allocated from, then Even with all the options suggested to identify pointers to the interior of a pool object, we still need to distinguish between a freshly allocated object and a freelist entry. We can identify freelist entries fairly easily, however, because the header points to another header rather than to another object (i.e. A remaining problem was that the linked list in The API call I also removed |
I don't think that'll work reliably with small cell size and that's why I said,
This should be reflected in the comment. |
The only concern here seems to be
This is where I'm not sure how that would work. In We can use the age bit to tell if a cell survived A way to solve this (and which seems to work, based on a quick test) is to compare the pointer to
It already is, though I think I'll clarify the wording. |
I don't know if this information is helpful, but some of this sounds like what |
I've pushed the relevant changes to use age instead as the arbiter of whether we're dealing with an allocated or free cell. A couple of potential concerns remain, though. 1. I'm not sure if and when a page in 2. I had to change the value of |
@yuyichao is this more like what you had in mind? Any remaining concerns? I see that CircleCI tests failed, with an error message saying "ERROR: LoadError: did not find test files for the following tests: OldPkg/pkg". Not sure what to do about that, any hints? (Perhaps @rbehrends already knows |
@fingolfin I suspect that I need to rebase this onto a more recent version, it may have to do with Pkg changes? The tests themselves see to work (unsurprisingly, as I ran them myself before pushing). |
Yes, circle CI is not very smart and using an old config file. A rebase on master will fix it. |
178b967
to
afa86d7
Compare
Having rebased onto the current master, everything seems to pass now. |
src/gc.c
Outdated
static jl_gc_callback_list_t *gc_cblist_notify_external_alloc; | ||
static jl_gc_callback_list_t *gc_cblist_notify_external_free; | ||
|
||
#define gc_invoke_callbacks(kind, args) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not fixed. (Again, function like macro is fine, but not name concatination).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just to clarify: you prefer that we replicate this code 6 times? The justification being that this way one can grep for e.g. gc_cblist_root_scanner
to find where it is used?
A simple alternative to achieve that would be to add a comment to each of the 6 invocations of gc_invoke_callbacks
, e.g.
// iterate over gc_cblist_root_scanner
gc_invoke_callbacks(notify_external_alloc, (v, allocsz));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just to clarify: you prefer that we replicate this code 6 times?
That'll be better, yes. But you don't have to, The only bad thing about this is the symbol concatenation for the variable and type names, just spell them out explicitly will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I had seen your earlier comment, but thought that was about the public API only, which I changed accordingly (whereas this use is internal). This is easy enough to change, too, but to explain why I did it: I wanted to keep the code typesafe so that list name and type always match. This is not very critical, though, as this is only used in a handful of places and only in one file, so if avoiding symbol concatenation is the bigger priority that's also not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know type safety is the issue. There are enough of these in the GC code that I hope we could use c++ here but fortunately most of these could be solved without too much more typing and without relying on a lot of macro tricks so far.............
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I notice that there are quite a few macros in Julia which use preprocessor concatenation, and arguably for symbols were this is far more inconvenient than for this fairly localized and special one. E.g. I just spent some time tracking down where jl_box_uint64
is defined. Other examples include JL_TYPECHK
and code for intrinsics... So it's not quite clear to me why those are OK, but ours is not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations. That exact macro is the very reason I hate this kind of macro so much. Unless I actually want to change something about these macros I'm not a fan of just changing those but I also don't want to introduce new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand now: you really want all gone, and the existing ones are "legacy"
src/gc.c
Outdated
return NULL; | ||
} | ||
// Not a freelist entry, therefore a valid object. | ||
valid_object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less indent for the label, see other places in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's easy enough.
// a lower address will be an allocated object, and for cells | ||
// with the same or a higher address, the corresponding age | ||
// bit will reflect whether it's on the freelist. | ||
// Age bits are set in sweep_page() and are 0 for freelist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, for this to work reliably you still need to disable the age reset in the finalizer marking. There should be only a single place where mark_reset_age
is set to 1 so just disable that. Also, in general, it's still good to have an API for enabling support for conservative scanning. That function should also return whether the conservative scanning support was enabled and you should document that if the setup doesn't guarantee that the support was enabled early enough, the caller should do a full GC afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I need to unpack that, as I think I simply misunderstood you before (thinking you were referring to sweeping rather than finalizers). I take it now that mark_reset_age = 1
should be disabled for conservative scanning only?
I had removed the function for enabling conservative scanning because it had become a no-op; but it does make sense to have one regardless in case future changes necessitate having one. And if I understand you correctly, the mark_reset_age
change is for the conservative case alone and should be left in place for the normal case, so we'd have to put code in there again anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it now that mark_reset_age = 1 should be disabled for conservative scanning only?
Yes. Sorry if it wasn't clear.
but it does make sense to have one regardless in case future changes necessitate having one.
That's exactly what I'm saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it does make sense to have one regardless in case future changes necessitate having one.
And in particular, the behavior of "if the function returns 0 then the caller should do a full GC if it wasn't called right after initialization" should be a behavior that is reasonably easy to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does that mean mark_reset_age
is "just" an optimization, which is thus safe to disable for us? I wonder how much impact disabling it has, esp. compared to our previous version of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's just an optimization. It could delay the freeing of old finalized objects but should not have any other effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuyichao Thanks for the explanation, and of course all the feedback
@rbehrends Perhaps we could extend the comment at the declaration of mark_reset_age
to state that as part of this PR, so that future people looking at this will have an easier time figuring it out :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed semantics for jl_gc_enable_conservative_gc_support()
actually run into problems in the multit-threaded case, as other threads may do work (such as allocations) in between the call and the garbage collection, leading to a race condition. I would therefore propose to do the full collection as part of the jl_gc_enable_conservative_gc_support()
semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine too. There shouldn't be a issue though since as long as you make sure you don't start doing things that requires conservative scanning before running the GC, the race is benign. Not requiring a external GC does make the API simpler so I'm fine with that. I brought it up only in case you want to reduce the startup delay a little bit.
src/julia_internal.h
Outdated
@@ -289,7 +289,7 @@ JL_DLLEXPORT jl_value_t *jl_gc_alloc(jl_ptls_t ptls, size_t sz, void *ty); | |||
# define jl_gc_alloc(ptls, sz, ty) jl_gc_alloc_(ptls, sz, ty) | |||
#endif | |||
|
|||
#define jl_buff_tag ((uintptr_t)0x4eade800) | |||
#define jl_buff_tag ((uintptr_t)0x4eade000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I introduced for the reasons mentioned in an earlier comment. Basically, it is possible for a spurious reference to an object with the type jl_buff_tag
to occur with conservative scanning; such an object must not be passed to jl_gc_mark_queue_obj()
and similar functions that try to treat the header word as a jl_datatype_t *
that they can derefence. So we want to make sure that jl_buff_tag
cannot accidentally be mistaken for a value that can legitimately occur in the object header (which is extremely unlikely, but at least theoretically possible). By making it a multiple of GC_PAGE_SZ
, we avoid that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you should (edit: NOT) be relying on that to distinguish allocated and unallocated cell anymore?
Also, just realized that I forgot to mention, even with the conservative scan, any code that is julia-aware has to use the GC frame explicitly. More specifically, any code that creates and access julia array in C code has to do that since otherwise the array object itself might not be anywhere on the stack while the buffer is still being used which will cause 1) (from the current implementation) the buffer won't be scanned and 2) the buffer itself doesn't even contain enough information about what's in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you should (edit: NOT) be relying on that to distinguish allocated and unallocated cell anymore?
Hmm. I was under the impression that objects with jl_buff_tag
can still happen to occur as a live object. While they get special treatment in the mark loop, I believed that they can still occur in other contexts (e.g. when allocated off pool->newpages
) in ways that make them indistinguishable from normal allocations.
Also, just realized that I forgot to mention, even with the conservative scan, any code that is julia-aware has to use the GC frame explicitly. More specifically, any code that creates and access julia array in C code has to do that since otherwise the array object itself might not be anywhere on the stack while the buffer is still being used which will cause 1) (from the current implementation) the buffer won't be scanned and 2) the buffer itself doesn't even contain enough information about what's in there.
I think this is something that simply needs to be documented. Namely, that arrays need to be kept visible to the Julia GC even if they are not visible on the stack anymore if you are accessing its elements. E.g. using JL_GC_PUSH1()
and friends, storing them in a Julia object or a root while accessing elements, or some such. But yes, I think this is something that requires explicit documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I was under the impression that objects with jl_buff_tag can still happen to occur as a live object.
Ah, so what you want to distinguish jl_buff_tag
from a real tag reliably. I though you needed to use this to distinguish between a unallocated and allocated cell. This use is fine then, but it should be documented near the definition of the jl_buff_tag
, and possibly a "see also" here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and possibly a "see also" here as well.
Oh, and by "here" I mean where you check that in the code. I thought that's where this comment thread is....
// Insert free page after first page. | ||
// This prevents unnecessary fragmentation from multiple pages | ||
// being allocated from at the same time. Instead, objects will | ||
// only ever be allocated from the first object in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and this comment is still missing a mention of the relation to conservative scanning that I requested last time. That won't be clear for future reader out of the context of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. Will do that. I hadn't realized that you wanted me to make the connection to conservative scanning and thought you simply wanted the logic explained. But putting it also in the context of conservative scanning makes perfect sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbehrends hmm, this comment has not yet been addressed, has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has, the comment now explicitly notes thatjl_gc_internal_obj_base_ptr()
(which is part of the conservative marking machinery) relies on it.
I believe I have addressed everything now (barring a fresh rebase of the commits). Tests pass locally, too. |
src/gc.c
Outdated
{ | ||
size_t p = 0; | ||
for (size_t i = 0; i < objs->len; i++) { | ||
jl_value_t *v = (jl_value_t *)(objs->items[i++]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is i incremented twice on purpose ? If so, IMHO a comment should explain this. If not..,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's a bug (and would result in a resource leak). I've addressed this along with the other comments and amended the existing commits.
src/gc.c
Outdated
|
||
static void gc_sweep_foreign_objs(void) | ||
{ | ||
for (int i = 0;i < jl_n_threads;i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert spaces after semicolon?
66a98c4
to
ed02b83
Compare
src/julia_gcext.h
Outdated
|
||
// Types for mark and sweep functions. | ||
// We make the cache and sp parameters opaque so that the internals | ||
// do not get exposed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, cache and sp? Outdated comment?
src/julia_gcext.h
Outdated
|
||
// Kinds of callbacks. For each kind of callback, there must be | ||
// a corresponding type with the same name, but in all lowercase, | ||
// and with a "_t" suffix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is outdated and now not very helpful. Just remove it?
src/julia_gcext.h
Outdated
// marking. Conservative marking will treat any machine word that points | ||
// to an object (potentially including its interior) as a valid | ||
// reference, regardless of whether it actually is one or just an | ||
// integer that happens to match an actual address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this comment sounds as if conservative marking was enabled by this switch; the fine distinction that only "support" for it is enabled, and that the caller needs to perform additional work (and which) is easy to miss. So perhaps nake that clear?
Here, being aware of the Julep would be useful... Would it be appropriate to insert a reference to it in a comment?
if m === nothing | ||
return false | ||
else | ||
num = m[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, an unwanted tab slipped in. It should be fixed now.
ed02b83
to
d4f0aa6
Compare
d4f0aa6
to
bd880ea
Compare
@yuyichao @Keno From our perspective, this PR should be ready now; at least we are not aware of any remaining unresolved issues. The failed Travis test seems to be a fluke (it failed running Of course we can clean up the commits in this PR if you like us to, but I figured that only makes sense if you are happy with it now; if you want further changes, please let us know. |
src/gc.c
Outdated
@@ -2125,6 +2292,9 @@ mark: { | |||
int stkbuf = (ta->stkbuf != (void*)(intptr_t)-1 && ta->stkbuf != NULL); | |||
int16_t tid = ta->tid; | |||
jl_ptls_t ptls2 = jl_all_tls_states[tid]; | |||
ptls->last_gc_mark_sp = &sp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned this in an early comment. This leaks the address of sp
, should be replaced by copy instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean by "replaced by copy" here?
And I don't think you mentioned this before (nor could I find such a comment). To the contrary, our first version of the Julep and the code in our branch (from before the PR was submitted) did not have last_gc_mark_sp
but rather used a more complicated approach; it was then suggested to us by you to move this into the TLS (for which we were quite grateful, as it simplified some things considerably).
Anyway, perhaps you could elaborate on what you imagine should replace this (while ensuring that jl_gc_mark_queue_obj
and jl_gc_mark_queue_objarray
keep working)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it's better to just explicitly synchronize them at those point instead of leaking
&sp
from the beginning of the function.
That's actually what i mean by explicitly synchronize instead of leaking address. So instead of
ptls->last_gc_mark_sp = &sp;
<external call>
Do
ptls->last_gc_mark_sp = sp;
<external call>
sp = ptls->last_gc_mark_sp;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure, we can do that.
However (with the caveat that I have not yet had a chance to try the change), I think that means either including gc.h from julia_threads.h to get gc_mark_sp_t
, or else using a hard coded buffer size there matching the size of gc_mark_sp_t
(and then having a static_assert
somewhere to ensure that value stays correct). Which do you prefer? Resp. if you have an alternate idea, please let us know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def of gc_mark_sp_t
could be moved to julia_thread.h
and renamed to jl_gc_mark_sp_t
. That's the approach used by jl_gc_mark_cache_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About copying: this could introduce measurable overhead, as the struct consists of eight words. More importantly, as the contents of the struct change during marking, this would be problematic during root scanning, as there would be no way to retrieve any changes to the struct. And the internals of the struct would be exposed, anyway.
An alternative idea would be to explicitly document when the field is valid.
Renaming the type wouldn't be a problem, of course. Part of the reason why I made the field a void *
was for purposes of information hiding, i.e. to make it opaque and to not expose its internals. But that could also be done with an opaque struct pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbehrends well, can you measure it? If you implement this in a commit atop the PR, we can easily drop it later on if it turns out to be too bad.
I don't understand what problem with "contents of the struct change during marking" is; and what changes you think need to be "retrieved". Could you elaborate?
As to you "An alternative idea", I also don't quite see to what it is meant to be an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About copying: this could introduce measurable overhead, as the struct consists of eight words.
The issue is that the control flow in this function is very unfriendly to the compiler, to put it mildly, so the compiler won't be able to do any escape analysis following the control flow. The sp
is the most accessed variable in the function so I do not want to compromise it's performance. (Note that all the fast path in this function is fully inlined and sp
doesn't leak to any other functions). Also note that since there aren't that many callee saved registers on x86, all of the function calls are already surrounded by many register spills which are basically copying things around.
as there would be no way to retrieve any changes to the struct
That's why they are copied back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see the problem.
That's why they are copied back?
Hmm, that would be two struct copies per mark operation. I really have to benchmark that to see what happens. It may be necessary to add to the API to permit bulk operations. I'll go and do a provisional implementation of this approach and see how that turns out, performance-wise.
Edit: Nevermind, I think I know how to cut down on the copying quite a bit. I still have to do the implementation & benchmarking to see how much of an impact it actually has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial benchmarking seems to indicate that the overhead remains fairly small, even if you have lots of objects with custom mark functions. I still need to figure out why I now have a rare and difficult to reproduce crash as the result of these changes, though.
|
||
int old = jl_astaggedvalue(new_obj)->bits.gc & 2; | ||
ptls->last_gc_mark_sp = &sp; | ||
uintptr_t young = desc->markfunc(ptls, new_obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
src/gc.c
Outdated
@@ -2468,6 +2648,9 @@ static int _jl_gc_collect(jl_ptls_t ptls, int full) | |||
|
|||
// 3. walk roots | |||
mark_roots(gc_cache, &sp); | |||
ptls->last_gc_mark_sp = &sp; | |||
gc_invoke_callbacks(jl_gc_cb_root_scanner_t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
src/gc.c
Outdated
if (jl_is_initialized()) { | ||
JL_LOCK_NOGC(&conservative_gc_lock); | ||
int result = support_conservative_marking; | ||
support_conservative_marking = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jl_atomic_exchange
and get rid of the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's easy enough.
src/gc.c
Outdated
|
||
JL_DLLEXPORT int jl_gc_conservative_gc_support_enabled() | ||
{ | ||
return support_conservative_marking; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the other one has lock/atomic, this one should be as well mainly for consistency. jl_atomic_load_acquire
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. It's not strictly necessary, as this is a benign data race, but I agree that it would be less confusing.
I've updated this PR now. @rbehrends will hopefully work on adding a task scanner test (but IMHO that could also be added in a later PR, i.e. it should IMHO not delay merging of this PR; but of course you may disagree ;-) |
I've added a test for the task scanner functionality now. |
@vtjnash in 082d7d2 you added this in void *stkbuf = ta->stkbuf;
if (ta == thread_task && ptls->copy_stack)
gc_scrub_range(ptls2->stackbase, ptls2->stacksize);
else if (stkbuf)
gc_scrub_range((char*)stkbuf, (char*)stkbuf + ta->bufsz); I am confused by this -- isn't the second argument to
So perhaps in the first snippet, what is really meant is this: (?) gc_scrub_range(ptls2->stackbase - ptls2->stacksize, ptls2->stackbase); This would also fit with code I see in the various ptls->stackbase = stkbuf + ssize;
ptls->stacksize = ssize; However, there, usually this is contained in Is there any documentation on all this? The code is almost completely undocumented, and I have a hard time reverse engineering it. |
95b5ddd
to
8f17ca2
Compare
The underlying problem for the crash was that in Right now, I believe that this needs an abstraction to determine a task's stack area regardless. I've added a function |
530838d
to
7942395
Compare
See f6879e1 where we fixed that. |
@StefanKarpinski I don't think this is necessary. As I wrote above, the core issue appeared to be that the stack buffer of the root task of the main thread was manipulated in a way that made the start of the stack buffer point to the middle of nowhere. I addressed that in 83bc53b. |
Note that because these stacks "grow down", start and end are reversed from usual. So here, we're intentionally manipulating the end of the stack buffer to point at the middle of nowhere (well, actually into the guard pages at the bottom). But you should never scan that far down the stack, just start at your best guess for the start and end at the current frame pointer for that thread. We can't actually determine where the stack starts and ends on most operating systems. Although we could bound it more precisely than we currently do (because we know the precise size of it and the address of one point that's somewhere in the middle). |
Yeah, so from our end, everything is working fine right now, so if this PR was merged as is, we'd be happy. But of course we'll also try to address any further questions or change requests you may have! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only done a cursory review of the latest, but I did look over the original proposal fairly closely and that looked reasonable to be, so if @yuyichao is ok with the implementation as is, I'm happy to have this merged. If anything else comes up we can do that in follow-on PRs.
Separately, you should be commended on an excellent PR-submitting process. Getting features like this through the review process is non trivial and a lot of work, but I think you've been most excellent about it. |
Thank you for the kind words, and thanks to all involved for the support, esp. @yuyichao for many helpful remarks and careful reviewing! |
I have read through the proposal and it seems like exactly what I need. I have a question, though: if I use |
You must only call |
I see. Thanks for clarifying. Can't wait until the release :) |
This pull request implements the GC extension proposal described in this Julia Enhancement proposal.
See the Julep above for a full description of the API and semantics.