GC Extensions #28368

rbehrends · 2018-07-31T04:17:45Z

This pull request implements the GC extension proposal described in this Julia Enhancement proposal.

See the Julep above for a full description of the API and semantics.

fingolfin · 2018-07-31T13:17:53Z

src/gc.c

@@ -2286,7 +2328,7 @@ mark: {
                obj16 = (gc_mark_obj16_t*)sp.data;
                goto obj16_loaded;
            }
-            else {
+            else if (layout->fielddesc_type == 2) {
                // This is very uncommon
                // Do not do store to load forwarding to save some code size
                assert(layout->fielddesc_type == 2);


I didn't notice this before, but I guess this assert could now be removed?

@rbehrends: please address this, too?

fingolfin · 2018-07-31T13:24:18Z

src/gc-pages.c

+    }
+    support_conservative_scans = 1;
+    JL_UNLOCK_NOGC(&conservative_scan_lock);
+}


Note that all this function does in the end (at least right now) is to enable the support_conservative_scans and jl_gc_cleanup_pages flags. So perhaps the Julia team would prefer a function name that better reflects that, e.g. jl_gc_enable_clean_pages. I think the main reason we went with the current name is that hypothetically, it could do additional work in the future related to conservative scanning. Anyway, in the end, as log as get this functionality in some way, we will of course adapt to whichever names you might prefer.

yuyichao

OK in general except for the second last commit. I don't think there's a way around hooking into implementation details (at least in the sense for every other pieces of code) but the coupling here is a little too much and too unstable.

I also don't think you need any of these since you can get all these info from the page metadata.

During marking, when you get a pointer pointing to the pool pages,

Find the cell boundary
If the cell is marked, then you're done
If nfree is 0, there's either no free cell in the page, in which case the cell you've just found is a valid object, or the page is being allocated. If the page is being allocated, you can tell if the cell is valid or not by comparing its address to the current allocation point on the thread the page belongs to.
Now the only two possibilities are free cell or a young object on a page that has not been touched since last sweep. In this case, you can use the age to tell the difference. The only issue is the reset age done in the finalizer marking so the only change you'll need to make is to disable the age reset there.

yuyichao · 2018-07-31T17:32:26Z

src/gc.c

+static jl_gc_callback_list_t *gc_cblist_pre_gc;
+static jl_gc_callback_list_t *gc_cblist_post_gc;
+
+#define gc_invoke_callbacks(kind, args) \


The macro symbol concatenation is still here.....

yuyichao · 2018-07-31T17:33:26Z

src/gc.c

+            desc->sweepfunc(v);
+        }
+        else {
+          objs->items[p++] = v;


yuyichao · 2018-07-31T17:38:14Z

src/gc-pages.c

@@ -21,6 +21,19 @@ extern "C" {

 static int block_pg_cnt = DEFAULT_BLOCK_PG_ALLOC;
 static size_t current_pg_count = 0;
+static int support_conservative_scans = 0;


So this commit is the kind of change that will be very hard to maintain... It relies on a lot of the internal implementation details that can otherwise be changed without any issue.

Okay, I think I can now get rid of this part entirely by implementing your other suggestion for finding references to pool allocations, as that would (I think) eliminate the need to zero pages entirely. I'll put the code in and get back to you ASAP with an update (including also the other requested changes).

rbehrends · 2018-08-08T12:55:54Z

The changes turned out to be a bit more intricate than I had expected. They are all contained in a single new commit (for now) instead of integrating them with the existing commits to make review of the additional changes easier.

The "no free objects on page" case must actually be split up; if a page is being allocated from, then nfree contains the max number of objects that fit inside the page, and nfree is only zeroed once the page is used up. Both cases are covered by fl_begin_offset == (uint16_t)-1 and we can then use nfree == 0 to figure out if the whole page is in use or if it's still being allocated from.

Even with all the options suggested to identify pointers to the interior of a pool object, we still need to distinguish between a freshly allocated object and a freelist entry. We can identify freelist entries fairly easily, however, because the header points to another header rather than to another object (i.e. (p - pagestart - GC_PAGE_OFFSET) % osize == 0).

A remaining problem was that the linked list in newpages could contain multiple pages that were being allocated from. Checking all of them would have required iterating over potentially dozens of pages. In order to avoid that overhead, I needed to change reset_page() so that the page was inserted after the first one, not before it. This way, only the first page in the newpages list was ever being used for allocation. An additional benefit is that this should reduce fragmentation, as we can't end up with multiple partially used pages.

The API call jl_gc_enable_conservative_gc_support() could be removed entirely as a result of these changes.

I also removed jl_gc_is_internal_obj_alloc(). It had confusing semantics and only existed for performance reasons, which aren't a concern here anymore.

yuyichao · 2018-08-08T13:08:41Z

We can identify freelist entries fairly easily, however, because the header points to another header rather than to another object (i.e. (p - pagestart - GC_PAGE_OFFSET) % osize == 0).

I don't think that'll work reliably with small cell size and that's why I said,

In this case, you can use the age to tell the difference. The only issue is the reset age done in the finalizer marking so the only change you'll need to make is to disable the age reset there.

This way, only the first page in the newpages list was ever being used for allocation.

This should be reflected in the comment.

rbehrends · 2018-08-08T15:58:54Z

I don't think that'll work reliably with small cell size

The only concern here seems to be osize == sizeof(jl_taggedvalue_t), i.e. a pool of empty objects. But this should be safe also, as the type of the object cannot have size zero. For all other values of osize, a pointer to a type can always be distinguished from a pointer to a freelist by the offset.

In this case, you can use the age to tell the difference.

This is where I'm not sure how that would work. In jl_gc_pool_alloc(), popping an object off the freelist does not change the age. I.e. we cannot use the corresponding age bit to distinguish between the cell just before the allocation (when it is on the freelist) and after the allocation (when it becomes an object).

We can use the age bit to tell if a cell survived sweep_page(), but we cannot tell if it was allocated again after that or remains on the freelist.

A way to solve this (and which seems to work, based on a quick test) is to compare the pointer to pool->freelist before testing the age (if its address is less than the freelist address, it was either allocated before or picked from the freelist). However, this relies on items in the freelist being in ascending order per page, which seems to be a somewhat fragile invariant (though jl_gc_pool_alloc() also relies on it).

This should be reflected in the comment.

It already is, though I think I'll clarify the wording.

vtjnash · 2018-08-08T19:45:57Z

I don't know if this information is helpful, but some of this sounds like what gc_verify_tags_page also needed to do.

rbehrends · 2018-08-09T11:10:26Z

I've pushed the relevant changes to use age instead as the arbiter of whether we're dealing with an allocated or free cell.

A couple of potential concerns remain, though.

1. I'm not sure if and when a page in pool->newpages is ever swept. In testing, sweep_page() seems to never be called with a page in pool->newpages as an argument (and I don't think it would work, anyway), but I'm uncertain why it does not seem to happen.

2. I had to change the value of jl_buff_tag; while it seems to be somewhat arbitrary, I'd like to make sure that its exact value doesn't have a deeper purpose. The issue here is that objects with such a type should never be passed to marking functions (as that would crash the mark loop). I changed the value slightly so that jl_buff_tag % GC_PAGE_SZ == 0 and jl_buff_tag can never accidentally be the same value as an actual type. (This may also be relevant for code in gc-debug.c).

fingolfin · 2018-08-10T09:20:02Z

@yuyichao is this more like what you had in mind? Any remaining concerns?

I see that CircleCI tests failed, with an error message saying "ERROR: LoadError: did not find test files for the following tests: OldPkg/pkg". Not sure what to do about that, any hints? (Perhaps @rbehrends already knows

rbehrends · 2018-08-10T09:46:31Z

@fingolfin I suspect that I need to rebase this onto a more recent version, it may have to do with Pkg changes? The tests themselves see to work (unsurprisingly, as I ran them myself before pushing).

KristofferC · 2018-08-10T15:57:53Z

Yes, circle CI is not very smart and using an old config file. A rebase on master will fix it.

rbehrends · 2018-08-17T11:53:50Z

Having rebased onto the current master, everything seems to pass now.

yuyichao · 2018-08-17T11:57:25Z

src/gc.c

+static jl_gc_callback_list_t *gc_cblist_notify_external_alloc;
+static jl_gc_callback_list_t *gc_cblist_notify_external_free;
+
+#define gc_invoke_callbacks(kind, args) \


This is still not fixed. (Again, function like macro is fine, but not name concatination).

So, just to clarify: you prefer that we replicate this code 6 times? The justification being that this way one can grep for e.g. gc_cblist_root_scanner to find where it is used?

A simple alternative to achieve that would be to add a comment to each of the 6 invocations of gc_invoke_callbacks, e.g.

// iterate over gc_cblist_root_scanner gc_invoke_callbacks(notify_external_alloc, (v, allocsz));

So, just to clarify: you prefer that we replicate this code 6 times?

That'll be better, yes. But you don't have to, The only bad thing about this is the symbol concatenation for the variable and type names, just spell them out explicitly will be fine.

Got it. I had seen your earlier comment, but thought that was about the public API only, which I changed accordingly (whereas this use is internal). This is easy enough to change, too, but to explain why I did it: I wanted to keep the code typesafe so that list name and type always match. This is not very critical, though, as this is only used in a handful of places and only in one file, so if avoiding symbol concatenation is the bigger priority that's also not a problem.

Yes, I know type safety is the issue. There are enough of these in the GC code that I hope we could use c++ here but fortunately most of these could be solved without too much more typing and without relying on a lot of macro tricks so far.............

BTW, I notice that there are quite a few macros in Julia which use preprocessor concatenation, and arguably for symbols were this is far more inconvenient than for this fairly localized and special one. E.g. I just spent some time tracking down where jl_box_uint64 is defined. Other examples include JL_TYPECHK and code for intrinsics... So it's not quite clear to me why those are OK, but ours is not?

Congratulations. That exact macro is the very reason I hate this kind of macro so much. Unless I actually want to change something about these macros I'm not a fan of just changing those but I also don't want to introduce new ones.

OK, I understand now: you really want all gone, and the existing ones are "legacy"

yuyichao · 2018-08-17T12:04:03Z

src/gc.c

+            return NULL;
+        }
+        // Not a freelist entry, therefore a valid object.
+        valid_object:


Less indent for the label, see other places in the code.

Okay, that's easy enough.

yuyichao · 2018-08-17T12:08:34Z

src/gc.c

+        //   a lower address will be an allocated object, and for cells
+        //   with the same or a higher address, the corresponding age
+        //   bit will reflect whether it's on the freelist.
+        // Age bits are set in sweep_page() and are 0 for freelist


As I said, for this to work reliably you still need to disable the age reset in the finalizer marking. There should be only a single place where mark_reset_age is set to 1 so just disable that. Also, in general, it's still good to have an API for enabling support for conservative scanning. That function should also return whether the conservative scanning support was enabled and you should document that if the setup doesn't guarantee that the support was enabled early enough, the caller should do a full GC afterwards.

Okay, I need to unpack that, as I think I simply misunderstood you before (thinking you were referring to sweeping rather than finalizers). I take it now that mark_reset_age = 1 should be disabled for conservative scanning only?

I had removed the function for enabling conservative scanning because it had become a no-op; but it does make sense to have one regardless in case future changes necessitate having one. And if I understand you correctly, the mark_reset_age change is for the conservative case alone and should be left in place for the normal case, so we'd have to put code in there again anyway.

I take it now that mark_reset_age = 1 should be disabled for conservative scanning only?

Yes. Sorry if it wasn't clear.

but it does make sense to have one regardless in case future changes necessitate having one.

That's exactly what I'm saying.

but it does make sense to have one regardless in case future changes necessitate having one.

And in particular, the behavior of "if the function returns 0 then the caller should do a full GC if it wasn't called right after initialization" should be a behavior that is reasonably easy to maintain.

So, does that mean mark_reset_age is "just" an optimization, which is thus safe to disable for us? I wonder how much impact disabling it has, esp. compared to our previous version of this PR?

Okay. Will do.

Yes it's just an optimization. It could delay the freeing of old finalized objects but should not have any other effects.

@yuyichao Thanks for the explanation, and of course all the feedback

@rbehrends Perhaps we could extend the comment at the declaration of mark_reset_age to state that as part of this PR, so that future people looking at this will have an easier time figuring it out :-)

The proposed semantics for jl_gc_enable_conservative_gc_support() actually run into problems in the multit-threaded case, as other threads may do work (such as allocations) in between the call and the garbage collection, leading to a race condition. I would therefore propose to do the full collection as part of the jl_gc_enable_conservative_gc_support() semantics.

That's fine too. There shouldn't be a issue though since as long as you make sure you don't start doing things that requires conservative scanning before running the GC, the race is benign. Not requiring a external GC does make the API simpler so I'm fine with that. I brought it up only in case you want to reduce the startup delay a little bit.

yuyichao · 2018-08-17T12:08:46Z

src/julia_internal.h

@@ -289,7 +289,7 @@ JL_DLLEXPORT jl_value_t *jl_gc_alloc(jl_ptls_t ptls, size_t sz, void *ty);
 #  define jl_gc_alloc(ptls, sz, ty) jl_gc_alloc_(ptls, sz, ty)
 #endif

-#define jl_buff_tag ((uintptr_t)0x4eade800)
+#define jl_buff_tag ((uintptr_t)0x4eade000)


I assume this is not needed anymore?

This is something that I introduced for the reasons mentioned in an earlier comment. Basically, it is possible for a spurious reference to an object with the type jl_buff_tag to occur with conservative scanning; such an object must not be passed to jl_gc_mark_queue_obj() and similar functions that try to treat the header word as a jl_datatype_t * that they can derefence. So we want to make sure that jl_buff_tag cannot accidentally be mistaken for a value that can legitimately occur in the object header (which is extremely unlikely, but at least theoretically possible). By making it a multiple of GC_PAGE_SZ, we avoid that problem.

Yes, but you should (edit: NOT) be relying on that to distinguish allocated and unallocated cell anymore?
Also, just realized that I forgot to mention, even with the conservative scan, any code that is julia-aware has to use the GC frame explicitly. More specifically, any code that creates and access julia array in C code has to do that since otherwise the array object itself might not be anywhere on the stack while the buffer is still being used which will cause 1) (from the current implementation) the buffer won't be scanned and 2) the buffer itself doesn't even contain enough information about what's in there.

Yes, but you should (edit: NOT) be relying on that to distinguish allocated and unallocated cell anymore?

Hmm. I was under the impression that objects with jl_buff_tag can still happen to occur as a live object. While they get special treatment in the mark loop, I believed that they can still occur in other contexts (e.g. when allocated off pool->newpages) in ways that make them indistinguishable from normal allocations.

Also, just realized that I forgot to mention, even with the conservative scan, any code that is julia-aware has to use the GC frame explicitly. More specifically, any code that creates and access julia array in C code has to do that since otherwise the array object itself might not be anywhere on the stack while the buffer is still being used which will cause 1) (from the current implementation) the buffer won't be scanned and 2) the buffer itself doesn't even contain enough information about what's in there.

I think this is something that simply needs to be documented. Namely, that arrays need to be kept visible to the Julia GC even if they are not visible on the stack anymore if you are accessing its elements. E.g. using JL_GC_PUSH1() and friends, storing them in a Julia object or a root while accessing elements, or some such. But yes, I think this is something that requires explicit documentation.

Hmm. I was under the impression that objects with jl_buff_tag can still happen to occur as a live object.

Ah, so what you want to distinguish jl_buff_tag from a real tag reliably. I though you needed to use this to distinguish between a unallocated and allocated cell. This use is fine then, but it should be documented near the definition of the jl_buff_tag, and possibly a "see also" here as well.

Sounds good. :)

and possibly a "see also" here as well.

Oh, and by "here" I mean where you check that in the code. I thought that's where this comment thread is....

yuyichao · 2018-08-17T12:54:22Z

src/gc.c

+        // Insert free page after first page.
+        // This prevents unnecessary fragmentation from multiple pages
+        // being allocated from at the same time. Instead, objects will
+        // only ever be allocated from the first object in the list.


Oh, and this comment is still missing a mention of the relation to conservative scanning that I requested last time. That won't be clear for future reader out of the context of this PR.

Noted. Will do that. I hadn't realized that you wanted me to make the connection to conservative scanning and thought you simply wanted the logic explained. But putting it also in the context of conservative scanning makes perfect sense.

@rbehrends hmm, this comment has not yet been addressed, has it?

It has, the comment now explicitly notes thatjl_gc_internal_obj_base_ptr() (which is part of the conservative marking machinery) relies on it.

rbehrends · 2018-08-21T11:55:03Z

I believe I have addressed everything now (barring a fresh rebase of the commits). Tests pass locally, too.

fingolfin · 2018-08-22T17:17:45Z

src/gc.c

+{
+    size_t p = 0;
+    for (size_t i = 0; i < objs->len; i++) {
+        jl_value_t *v = (jl_value_t *)(objs->items[i++]);


Is i incremented twice on purpose ? If so, IMHO a comment should explain this. If not..,

No, that's a bug (and would result in a resource leak). I've addressed this along with the other comments and amended the existing commits.

fingolfin · 2018-08-22T17:19:58Z

src/gc.c

+
+static void gc_sweep_foreign_objs(void)
+{
+    for (int i = 0;i < jl_n_threads;i++) {


Insert spaces after semicolon?

fingolfin · 2018-08-22T18:01:10Z

src/julia_gcext.h

+
+// Types for mark and sweep functions.
+// We make the cache and sp parameters opaque so that the internals
+// do not get exposed.


Huh, cache and sp? Outdated comment?

fingolfin · 2018-08-22T18:02:14Z

src/julia_gcext.h

+
+// Kinds of callbacks. For each kind of callback, there must be
+// a corresponding type with the same name, but in all lowercase,
+// and with a "_t" suffix.


This comment is outdated and now not very helpful. Just remove it?

fingolfin · 2018-08-22T20:08:44Z

src/julia_gcext.h

+// marking. Conservative marking will treat any machine word that points
+// to an object (potentially including its interior) as a valid
+// reference, regardless of whether it actually is one or just an
+// integer that happens to match an actual address.


To me, this comment sounds as if conservative marking was enabled by this switch; the fine distinction that only "support" for it is enabled, and that the caller needs to perform additional work (and which) is easy to miss. So perhaps nake that clear?

Here, being aware of the Julep would be useful... Would it be appropriate to insert a reference to it in a comment?

fingolfin · 2018-08-22T20:18:07Z

test/gcext/gcext-test.jl

+    if m === nothing
+	return false
+    else
+        num = m[1]


Not quite, an unwanted tab slipped in. It should be fixed now.

fingolfin · 2018-08-27T15:21:13Z

@yuyichao @Keno From our perspective, this PR should be ready now; at least we are not aware of any remaining unresolved issues. The failed Travis test seems to be a fluke (it failed running apt-get install), restarting it should hopefully make the tests fail.

Of course we can clean up the commits in this PR if you like us to, but I figured that only makes sense if you are happy with it now; if you want further changes, please let us know.

yuyichao · 2018-09-03T04:03:25Z

src/gc.c

@@ -2125,6 +2292,9 @@ mark: {
            int stkbuf = (ta->stkbuf != (void*)(intptr_t)-1 && ta->stkbuf != NULL);
            int16_t tid = ta->tid;
            jl_ptls_t ptls2 = jl_all_tls_states[tid];
+            ptls->last_gc_mark_sp = &sp;


I think I mentioned this in an early comment. This leaks the address of sp, should be replaced by copy instead.

I don't understand what you mean by "replaced by copy" here?

And I don't think you mentioned this before (nor could I find such a comment). To the contrary, our first version of the Julep and the code in our branch (from before the PR was submitted) did not have last_gc_mark_sp but rather used a more complicated approach; it was then suggested to us by you to move this into the TLS (for which we were quite grateful, as it simplified some things considerably).

Anyway, perhaps you could elaborate on what you imagine should replace this (while ensuring that jl_gc_mark_queue_obj and jl_gc_mark_queue_objarray keep working)

so it's better to just explicitly synchronize them at those point instead of leaking &sp from the beginning of the function.

That's actually what i mean by explicitly synchronize instead of leaking address. So instead of

ptls->last_gc_mark_sp = &sp; <external call>

Do

ptls->last_gc_mark_sp = sp; <external call> sp = ptls->last_gc_mark_sp;

OK, sure, we can do that.

However (with the caveat that I have not yet had a chance to try the change), I think that means either including gc.h from julia_threads.h to get gc_mark_sp_t, or else using a hard coded buffer size there matching the size of gc_mark_sp_t (and then having a static_assert somewhere to ensure that value stays correct). Which do you prefer? Resp. if you have an alternate idea, please let us know?

def of gc_mark_sp_t could be moved to julia_thread.h and renamed to jl_gc_mark_sp_t. That's the approach used by jl_gc_mark_cache_t.

About copying: this could introduce measurable overhead, as the struct consists of eight words. More importantly, as the contents of the struct change during marking, this would be problematic during root scanning, as there would be no way to retrieve any changes to the struct. And the internals of the struct would be exposed, anyway.

An alternative idea would be to explicitly document when the field is valid.

Renaming the type wouldn't be a problem, of course. Part of the reason why I made the field a void * was for purposes of information hiding, i.e. to make it opaque and to not expose its internals. But that could also be done with an opaque struct pointer.

@rbehrends well, can you measure it? If you implement this in a commit atop the PR, we can easily drop it later on if it turns out to be too bad.

I don't understand what problem with "contents of the struct change during marking" is; and what changes you think need to be "retrieved". Could you elaborate?

As to you "An alternative idea", I also don't quite see to what it is meant to be an alternative?

About copying: this could introduce measurable overhead, as the struct consists of eight words.

The issue is that the control flow in this function is very unfriendly to the compiler, to put it mildly, so the compiler won't be able to do any escape analysis following the control flow. The sp is the most accessed variable in the function so I do not want to compromise it's performance. (Note that all the fast path in this function is fully inlined and sp doesn't leak to any other functions). Also note that since there aren't that many callee saved registers on x86, all of the function calls are already surrounded by many register spills which are basically copying things around.

as there would be no way to retrieve any changes to the struct

That's why they are copied back?

Hmm, I see the problem.

That's why they are copied back?

Hmm, that would be two struct copies per mark operation. I really have to benchmark that to see what happens. It may be necessary to add to the API to permit bulk operations. I'll go and do a provisional implementation of this approach and see how that turns out, performance-wise.

Edit: Nevermind, I think I know how to cut down on the copying quite a bit. I still have to do the implementation & benchmarking to see how much of an impact it actually has.

Some initial benchmarking seems to indicate that the overhead remains fairly small, even if you have lots of objects with custom mark functions. I still need to figure out why I now have a rare and difficult to reproduce crash as the result of these changes, though.

yuyichao · 2018-09-03T04:03:34Z

src/gc.c

+
+                int old = jl_astaggedvalue(new_obj)->bits.gc & 2;
+                ptls->last_gc_mark_sp = &sp;
+                uintptr_t young = desc->markfunc(ptls, new_obj);


yuyichao · 2018-09-03T04:03:48Z

src/gc.c

@@ -2468,6 +2648,9 @@ static int _jl_gc_collect(jl_ptls_t ptls, int full)

    // 3. walk roots
    mark_roots(gc_cache, &sp);
+    ptls->last_gc_mark_sp = &sp;
+    gc_invoke_callbacks(jl_gc_cb_root_scanner_t,


yuyichao · 2018-09-03T04:05:40Z

src/gc.c

+    if (jl_is_initialized()) {
+        JL_LOCK_NOGC(&conservative_gc_lock);
+        int result = support_conservative_marking;
+        support_conservative_marking = 1;


jl_atomic_exchange and get rid of the lock.

Okay, that's easy enough.

yuyichao · 2018-09-03T04:07:18Z

src/gc.c

+
+JL_DLLEXPORT int jl_gc_conservative_gc_support_enabled()
+{
+    return support_conservative_marking;


Since the other one has lock/atomic, this one should be as well mainly for consistency. jl_atomic_load_acquire.

Will do. It's not strictly necessary, as this is a benign data race, but I agree that it would be less confusing.

fingolfin · 2018-10-04T16:21:44Z

I've updated this PR now. @rbehrends will hopefully work on adding a task scanner test (but IMHO that could also be added in a later PR, i.e. it should IMHO not delay merging of this PR; but of course you may disagree ;-)

rbehrends · 2018-10-04T17:01:55Z

I've added a test for the task scanner functionality now.

fingolfin · 2018-10-04T21:07:10Z

@vtjnash in 082d7d2 you added this in gc-debug.c:

    void *stkbuf = ta->stkbuf;
    if (ta == thread_task && ptls->copy_stack)
        gc_scrub_range(ptls2->stackbase, ptls2->stacksize);
    else if (stkbuf)
        gc_scrub_range((char*)stkbuf, (char*)stkbuf + ta->bufsz);

I am confused by this -- isn't the second argument to gc_scrub_range a pointer? So why is ptls2->stacksize passed to it? Should it perhaps be ptls2->stackbase + ptls2->stacksize instead? On the other hand, in signals-unix.c, I see this:

static int is_addr_on_stack(jl_ptls_t ptls, void *addr)
{
    jl_task_t *t = ptls->current_task;
    if (t->copy_stack)
        return ((char*)addr > (char*)ptls->stackbase - ptls->stacksize &&
                (char*)addr < (char*)ptls->stackbase);
    else
        return ((char*)addr > (char*)t->stkbuf &&
                (char*)addr < (char*)t->stkbuf + t->bufsz);
}

So perhaps in the first snippet, what is really meant is this: (?)

        gc_scrub_range(ptls2->stackbase - ptls2->stacksize, ptls2->stackbase);

This would also fit with code I see in the various jl_init_basefiber implementations:

    ptls->stackbase = stkbuf + ssize;
    ptls->stacksize = ssize;

However, there, usually this is contained in #ifdef COPY_STACKS ... #endif... so stackbase may not be initialized in general?!

Is there any documentation on all this? The code is almost completely undocumented, and I have a hard time reverse engineering it.

rbehrends · 2018-10-05T02:51:23Z

The underlying problem for the crash was that in jl_init_root_task(), the start of the stack was adjusted downwards by 3,000,000 bytes and the stack size was adjusted by the same amount. This led to the stack base for the root task pointing to the middle of nowhere and resulting in a segmentation fault if you tried to access that memory.

Right now, I believe that this needs an abstraction to determine a task's stack area regardless. I've added a function jl_task_stack_buffer() that will (1) give you the actual stack area (and is therefore also future-proof if jl_task_t changes) and (2) will tell you if the task is the current task of a Julia thread (and thus may require no/additional attention, such as skipping guard pages at the bottom of the stack buffer).

vchuravy · 2018-10-05T15:37:01Z

am confused by this -- isn't the second argument to gc_scrub_range a pointer? So why is ptls2->stacksize passed to it? Should it perhaps be ptls2->stackbase + ptls2->stacksize instead?

See f6879e1 where we fixed that.

StefanKarpinski · 2018-10-05T16:03:14Z

See f6879e1 where we fixed that.

So maybe just rebasing this PR on top of #29489 would fix the issue?

rbehrends · 2018-10-05T18:47:29Z

@StefanKarpinski I don't think this is necessary. As I wrote above, the core issue appeared to be that the stack buffer of the root task of the main thread was manipulated in a way that made the start of the stack buffer point to the middle of nowhere. I addressed that in 83bc53b.

vtjnash · 2018-10-05T18:57:56Z

Note that because these stacks "grow down", start and end are reversed from usual. So here, we're intentionally manipulating the end of the stack buffer to point at the middle of nowhere (well, actually into the guard pages at the bottom). But you should never scan that far down the stack, just start at your best guess for the start and end at the current frame pointer for that thread. We can't actually determine where the stack starts and ends on most operating systems. Although we could bound it more precisely than we currently do (because we know the precise size of it and the address of one point that's somewhere in the middle).

fingolfin · 2018-10-05T19:01:04Z

Yeah, so from our end, everything is working fine right now, so if this PR was merged as is, we'd be happy. But of course we'll also try to address any further questions or change requests you may have!

Keno

I have only done a cursory review of the latest, but I did look over the original proposal fairly closely and that looked reasonable to be, so if @yuyichao is ok with the implementation as is, I'm happy to have this merged. If anything else comes up we can do that in follow-on PRs.

Keno · 2018-10-05T20:06:55Z

Separately, you should be commended on an excellent PR-submitting process. Getting features like this through the review process is non trivial and a lot of work, but I think you've been most excellent about it.

fingolfin · 2018-10-05T22:09:41Z

Thank you for the kind words, and thanks to all involved for the support, esp. @yuyichao for many helpful remarks and careful reviewing!

kim366 · 2018-10-28T07:09:50Z

I have read through the proposal and it seems like exactly what I need. I have a question, though: if I use jl_gc_mark_queue_obj to root a value, how do I release it when I am done? Also, is this a replacement for JL_GC_PUSH/JL_GC_POP?

fingolfin · 2018-10-28T07:27:12Z

You must only call jl_gc_mark_queue_obj during a garbage collection (in particular, during each you want the object to survive). As such, the reference will be "released" when you stop calling jl_gc_mark_queue_obj on it during each GC.

kim366 · 2018-10-28T14:43:08Z

I see. Thanks for clarifying. Can't wait until the release :)

fingolfin reviewed Jul 31, 2018

View reviewed changes

ararslan added the GC Garbage collector label Jul 31, 2018

ararslan requested review from yuyichao and Keno July 31, 2018 16:37

yuyichao requested changes Jul 31, 2018

View reviewed changes

fingolfin mentioned this pull request Aug 7, 2018

Setup Travis tests for Julia integration gap-system/gap#2685

Closed

fingolfin mentioned this pull request Aug 9, 2018

Updates to the Julia GC integration gap-system/gap#2688

Merged

rbehrends force-pushed the rb/gc-extensions branch from 178b967 to afa86d7 Compare August 17, 2018 10:06

yuyichao requested changes Aug 17, 2018

View reviewed changes

yuyichao reviewed Aug 17, 2018

View reviewed changes

fingolfin reviewed Aug 22, 2018

View reviewed changes

rbehrends force-pushed the rb/gc-extensions branch 2 times, most recently from 66a98c4 to ed02b83 Compare August 22, 2018 18:02

fingolfin reviewed Aug 22, 2018

View reviewed changes

rbehrends force-pushed the rb/gc-extensions branch from ed02b83 to d4f0aa6 Compare August 23, 2018 07:02

vchuravy requested a review from yuyichao August 23, 2018 17:26

rbehrends force-pushed the rb/gc-extensions branch from d4f0aa6 to bd880ea Compare August 24, 2018 13:26

yuyichao requested changes Sep 3, 2018

View reviewed changes

rbehrends added 2 commits October 5, 2018 01:40

Add functionality to find a task's stack buffer.

83bc53b

Add a test for the task scanner.

7942395

rbehrends force-pushed the rb/gc-extensions branch 3 times, most recently from 95b5ddd to 8f17ca2 Compare October 5, 2018 02:49

rbehrends force-pushed the rb/gc-extensions branch 2 times, most recently from 530838d to 7942395 Compare October 5, 2018 12:09

Keno approved these changes Oct 5, 2018

View reviewed changes

Keno merged commit d461d56 into JuliaLang:master Oct 5, 2018

c42f mentioned this pull request Oct 28, 2018

GC rooting with new GC extensions API JuliaInterop/libcxxwrap-julia#19

Closed

rbehrends mentioned this pull request May 20, 2019

Prevent the GC task callback from segfaulting. #32088

Merged

fingolfin mentioned this pull request Feb 3, 2020

Backport GC Extensions to Julia 1.0 #34644

Closed

fingolfin mentioned this pull request Jun 9, 2020

Accessors for current_task, safe_restore in TLS #36064

Merged

fingolfin mentioned this pull request Jul 22, 2020

Better integration of "foreign types"? #36770

Closed

fingolfin deleted the rb/gc-extensions branch July 23, 2020 09:27

vtjnash mentioned this pull request Apr 19, 2022

gc-ext: only sweep unmarked objects #45035

Merged

GC Extensions #28368

GC Extensions #28368

Conversation

rbehrends commented Jul 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fingolfin Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

yuyichao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbehrends commented Aug 8, 2018

yuyichao commented Aug 8, 2018

rbehrends commented Aug 8, 2018 • edited Loading

vtjnash commented Aug 8, 2018

rbehrends commented Aug 9, 2018

fingolfin commented Aug 10, 2018 • edited Loading

rbehrends commented Aug 10, 2018 • edited Loading

KristofferC commented Aug 10, 2018

rbehrends commented Aug 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao Aug 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fingolfin Aug 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbehrends commented Aug 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fingolfin commented Aug 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbehrends Sep 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fingolfin Jul 31, 2018 •

edited

Loading

rbehrends commented Aug 8, 2018 •

edited

Loading

fingolfin commented Aug 10, 2018 •

edited

Loading

rbehrends commented Aug 10, 2018 •

edited

Loading

yuyichao Aug 17, 2018 •

edited

Loading

fingolfin Aug 22, 2018 •

edited

Loading

rbehrends Sep 5, 2018 •

edited

Loading