Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyone fancy playing with the Lua GC? #3175

Open
TerryE opened this issue Jun 21, 2020 · 14 comments
Open

Anyone fancy playing with the Lua GC? #3175

TerryE opened this issue Jun 21, 2020 · 14 comments

Comments

@TerryE
Copy link
Collaborator

TerryE commented Jun 21, 2020

I've been looking at how the Lua (5.3) GC works on our builds. At the moment it is pretty standard, but I want to tailor it to more our requirements. The standard GC has two tuning settings that you can tweak with collectgarbage():

  • "setpause". The current default setting is 110, which means that once the GC completes a cycle it waits until the memory use has grown by 10% before unpausing.
  • "setstepmul". The current default setting is 200, but I suspect that we should have this higher say 300 or 400.

The patch below adds a little instrumentation so you can see what is going on, but this could be tweaked. Some general comments:

  • If you play, then you will see that allocating a GCObject (one that is GC collectable, such as a string, proto, table, userdata, etc.) causes a sweep check and if past the pause threshold then a (partial) sweep is made through the GCObject list to find unreferenced ones.
  • Note that freeing doesn't seem to kick off sweeps, but seems to be relying and a stable mix of allocs and frees so that the allocs trigger the sweeps.
  • It seems to me that the idea of having an exponential spacing of the pauses is really arse about face. Unlike PCs which have poodles of RAM, we typically only have ~40-44Kb. Instead of pausing longer between sweeps as RAM use grows, we should start off doing a sweep every 4Kb or some step, and drop this interval (not increase it) as we approach running out of than so by the time we are down to the last 4Kb or so we need to switch into the Lua 5.1 which is to run collectgabage() every allocation.

Also note that a full GC is splil into sweeps and the full GC repeats these until no more RAM can be freed. If you are doing GC at the end of every action function, say, then a slightly lightweight approach might be to do a collectgarbage('step',4) instead.

Anyway if anyone else wants to have a play and get up to speed with how this all works it would really help being able to have an informed discussion about how to tweak this for our default setup.

Thanks Terry


 /*
 ** performs a basic GC step when collector is running
 */
+#ifdef LUA_USE_ESP8266 /*DEBUG*/
+extern void dbg_printf(const char *fmt, ...);
+#define CCOUNT_REG ({ int32_t r; asm volatile("rsr %0, ccount" : "=r"(r)); r;})
+#else                  /*DEBUG*/   
+#define dbg_printf(...)
+#define CCOUNT_REG 0
+#endif                 /*DEBUG*/
 void luaC_step (lua_State *L) {
   global_State *g = G(L);
   l_mem debt = getdebt(g);  /* GC deficit (be paid now) */
@@ -1141,15 +1148,19 @@ void luaC_step (lua_State *L) {
     return;
   }
   do {  /* repeat until pause or enough "credit" (negative debt) */
+/*DEBUG*/ int32_t start = CCOUNT_REG;
     lu_mem work = singlestep(L);  /* perform one single step */
     debt -= work;
+/*DEBUG*/ dbg_printf("singlestep - %d, %d, %u \n", debt, lua_freeheap(), CCOUNT_REG-start);
   } while (debt > -GCSTEPSIZE && g->gcstate != GCSpause);
   if (g->gcstate == GCSpause)
     setpause(g);  /* pause until next cycle */
   else {
+/*DEBUG*/ int32_t start = CCOUNT_REG;
     debt = (debt / g->gcstepmul) * STEPMULADJ;  /* convert 'work units' to Kb */
     luaE_setdebt(g, debt);
     runafewfinalizers(L);
+/*DEBUG*/ dbg_printf("new debt - %d, %d, %u \n", debt, lua_freeheap(), CCOUNT_REG-start);
   }
 }
@nwf
Copy link
Member

nwf commented Jun 22, 2020

Looking at the current system, it seems another perversity is that we're going to follow the exponential growth paradigm until we're bumping up against emergency OOMs and then we're likely to get "stuck" there in some cases:

  • the emergency pathway calls luaC_fullgc() with emergency = 1 which causes it to disable some GC operations, leaving some memory "stuck" for this cycle.

  • GCdebt is not recalculated until the call to setpause at the end, when GCestimate is asserted to equal gettotalbytes. Therefore, when it is recalculated, it will be set to 110% of the current heap size, regardless of how much heap there actually is. If we're riding with more than 1/1.1 ~ 91% ~~ 36KB of heap in use, then we will have "scheduled" our next incremental pass to begin only after the heap is exhausted.

  • So we merrily go along allocating, with checkGC believing that we have plenty of credit, until we slam into the emergency GC again.

This is especially frustrating since the very reason that we might be riding around with so much memory in use might be the stack (less likely) or string table (seems plausible) or pending finalizers (seems very plausible) that we could get rid of if we weren't doing EGC!

I think we should always clamp the credit the GC can give itself to some fraction (half? 3/4ths?) of the available heap (i.e. node.heap() * fraction). In fact, maybe we should just take the credit to be the maximum of that number and a user-specified value (similar to the negative arguments to node.egc.setmode(node.egc.ON_MEM_LIMIT,...)).

@TerryE
Copy link
Collaborator Author

TerryE commented Jun 22, 2020

The default Lua 5.1 strategy -- which is to do a full GC every allocation and free is particularly pathological. Pretty much any move to using the incremental GC as it is implemented in Lua gives factors (maybe 3-4×) improvement in throughput. In this light, if strategy A vs strategy B varies performance by 5 or 10% isn't so important. What is more important is that we don't convert apps that used to run successfully into E:M ones.

Whilst we will always have some Lua devs who manage to run out of RAM through bad programming practices or simply trying to do the wrong class of app on a ~44Kb RAM device, in my personal experience RAM isn't really an issue for the apps that I tackle once the Lua code is moved to LFS. Most rarely use much more than 20Kb, so RAM use is less than 50%.

What I feel that we need is a simple strategy that works efficiently for this ~50% utilisation but slowly degrades into "full GC every allocation" as we near 100% utilisation. Ideally these settings should be pretty robust so that they just work "out of the box" for typical Lua devs. So a couple of thoughts:

  • Doing a sweep step every allocation is typically at least an order of magnitude less expensive than the Lua 5.1 approach of doing a full GC. (Many sweep steps are only ~10 µSec or so). So why bother even to run the GC in pause/start mode? Why not just leave it running (that is have setpause = 100)?
  • We have a similar argument for setting the stepmul in the low to mid hundreds. Worth a play.
  • Doing a 2Kb (say) step is still factors cheaper than a full GC, so maybe we could have two thresholds: say when the heap left is less than 6 (?) Kb we do a 2Kb step per alloc then at say 4Kb remaining we do a full GC per alloc.

I think that we need to play about further on this. BTW, the -ve ON_MEM_LIMIT options is the (yet to be implemented) LUA_GCSETMEMLIMIT option on lua_gc() and collectgarbage('setmemlimit') will do.

@nwf
Copy link
Member

nwf commented Jun 22, 2020

Speaking of E:M, where is that message generated? I couldn't easily find it in the source (but perhaps my git grep mojo is lacking at the moment).

Leaving setpause at >100 seems fine to me, as it should mean that we sweep every additional 2-4 K of allocations, since the heap can only be 20-40K in size, right? That's potentially a lot of allocations that we've not spent 10 uSec on, so the impact may still be significant (but measurements are better than the guesses I'm making here). But when we do sweep, we probably want to be fairly aggressive about reclaim, so a stepmul of >200 seems like a worthwhile thing to consider.

The scenario to avoid is repeatedly slamming into EGC, which I think we can do by clamping the credit calculated by setpause to some/half/most of the remaining heap size. That should give us the kind of two-threshold scenario you describe, though in a soft way:

  • when heap is plentiful, we won't GC too often
  • as the free heap becomes < 10% of the allocated heap, we'll be more aggressive in trying to reclaim memory
  • EGC will still kick in if we actually hit the heap limit

That game can be modified by LUA_GCSETMEMLIMIT to move either the "< 10%" figure and/or the threshold at which we engage EGC.

Anyway, I'm happy to muck about in the GC some. :)

@TerryE
Copy link
Collaborator Author

TerryE commented Jun 22, 2020

The above patch is a good starting point. The CCOUNT deltas are very informative. I'll push through a review copy of the PR tomorrow with Alpha changes to GC tomorrow.

@TerryE
Copy link
Collaborator Author

TerryE commented Jun 22, 2020

BTW, there is something definitely a bit hickey about how MAX_LMEM is defined and used. grep this across app/lua53/*.* to see what I mean. This is defined in llimits.h as the same as the C++ PTRDIFF_MAX constant and is to do with manipulating l_mem fields (a ptrdiff_t type) and explicitly manipulating overflow, rather than anything to do with maximum size of RAM or whatever.

@nwf
Copy link
Member

nwf commented Jun 22, 2020

That just sounds like MAX_LMEM should have been called LMEM_MAX to go along with the C standard INT_MAX and friends.

@TerryE
Copy link
Collaborator Author

TerryE commented Jun 22, 2020

Speaking of E:M, where is that message generated? I couldn't easily find it in the source

Sorry. Didn't answer this one. It's in pvPortMalloc, specifically at offset +196. This does a dbgprintf("E:M %d\n", freemem); if it fails to allocate the requested RAM. I am not sure how to disable this other than patching SDK/libmain.a(mem_manager.o) to NOP out the call.

There are various tuning constants defined:

  • GCSTEPSIZE: 1600 = (cast_int(100 * sizeof(TString)))
  • GCSWEEPCOST 5 = ((sizeof(TString) + 4) / 4)
  • GCSWEEPMAX 64 = (cast_int((GCSTEPSIZE / GCSWEEPCOST) / 4))
  • GCFINALIZECOST: 5 = GCSWEEPCOST
  • STEPMULADJ: 200

Also luaC_step() sets the debt 16Kb (10*GCSTEPSIZE) which at ⅓ of the available RAM is just too large, IMO.

I also note that the 5.3 GC avoids the integrity failures that can occur in 5.1 where an Emergency GC can call a finaliser method which can end up resizing the stack in the middle of GC callback. If does this by not calling the "unsafe" by avoiding running finalisers or resizing the stack during an Emergency GC.

@TerryE
Copy link
Collaborator Author

TerryE commented Jun 24, 2020

In doing an Internet trawl on Lua GC internals, I came across Roberto Ierusalimschy's presentation at the 2018 Lua Workshop.

@stale
Copy link

stale bot commented Jun 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 20, 2021
@stale stale bot closed this as completed Jul 8, 2021
@TerryE
Copy link
Collaborator Author

TerryE commented Sep 26, 2021

@jmattsson Johny, this was the issue that I was talking about

@TerryE TerryE reopened this Sep 26, 2021
@stale stale bot removed the stale label Sep 26, 2021
@jmattsson
Copy link
Member

Interesting discussion, cheers.

@TerryE
Copy link
Collaborator Author

TerryE commented Oct 10, 2021

@jmattsson I've just dumped a work in progress to Notes on the Lua Garbage Collector in the NodeMCU implemention explaining how the Lua GC actually works. I've still got to add a bit on the Emergency LGC and how we should trigger it on a free heap() threshold rather than allocation failure because the latter doesn't leave any freeboard for device driver and other service mallocs. This is about the only tweak needed, as far as I can see.

I've done some forensics on an instrument version with luac -e db.lua where this contains a one-liner debug.debug() to give me a NodeMCU command interface on my ChromeBook. Any on-ESP work with have to wait until I get back to the UK on Thursday.

@nwf, you just might be interested in having a browse as well.

I recommend that the setpause value should be 100, which runs the stepper continuously in the background and a setstepmul of 200, if we do a collectgarbage('step',4) say as the epilogue to any cb.

I've also got some more detailed notes on the internals of lgc.c but I am not sure whether anyone will read them.

@jmattsson
Copy link
Member

Do you reckon that a step collection would make sense to bake into our luaL_pcallx(), or would that be too presumptuous?

@TerryE
Copy link
Collaborator Author

TerryE commented Oct 11, 2021

Enabling this as a (maybe default) option might be a good idea. As I said I've been playing with luac in my Chrome OS Linux sandbox, and doing some instrumentation. The code is so tangled that you really need to instrument to see what is going on. I would also like to do some instrumenting in the ESP environment and run some test cases with various settings to see what a good compromise is. That's a job for when I am back in the UK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants