-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler does not honour available disk space on long term storage #8566
Comments
@magik6k this might be one of the only remaining issues in the scheduler logic |
for me this has quite high priority - apart from the fact that;
its keeping all your eggs in one basket..
Because stor-14 booted / registered first (or last) with the miner, that one's getting all the FETCH jobs. All we need is a round-robin task queueing - that would fix all these issues (possibly with other scheduling issues too) |
So I was able to reproduce this on a local-network, but it takes a bit of configuration. So happy to hand you the login credentials to this server @shrenujbansal, so you faster can see the issue and hopefully find and confirm a potential fix 😄 Its based on the most recent master ( Create 3 tmpfs with 100M that the storage only lotus-workers will use: Initialize the 3 storage-only-lotus-worker (I set up a screen session for each, for easier management)
Second storage only lotus-worker (uses /root/storagelotusworker2)
Third storage only lotus-worker (uses /root/storagelotusworker3)
Create a regular sealing-worker (uses /root/sealingworker):
Turn off storage and sealing on the
And restart the whole system/workers. The output of
The expected behaviour is now that the sectors will seal on the sealing-worker which will then send the sealed sector to one of the three storage-only-lotus-workers based on the storage picking logic (weight*% available space). If we then run a script to pledge a lot of sectors, you should see after a while that only one (or two) of the storage-only-lotus-workers gets assigned, even if the last storage-only-lotus-worker has more available space. Current situation looks like this:
Here it would be expected that the |
Discussing this issue with @magik6k and looking at the code, the scheduler currently schedules tasks based on compute utilization. Since the GETs require very little compute, all such tasks get assigned to the first available worker which is why we see sectors getting assigned only to the a single worker and not all the other available workers There are 2 options to resolving the above problem:
|
Round robin makes very good sense from a performance scaling perspective (getting more capacity and throughput by adding more workers/paths). It does come with some issues, that needs addressing:
Otherwise using the storage utilisation could be right for some use cases, but not for all. Like I have a storage cluster with 16 JBODs / 16 individual paths with a worker for each, all filled up 50%. Now I add yet another but empty JBOD, so 0%, and the scheduler will based on storage utilisation, send every sector to this single worker / path. This would reduce the amount of parallel GETs from using all workers, to only using one, and throttle my sealing output a lot! There could also be a way where its not a round robin, but maybe just a "allow only x concurrent GET to a worker". Like only allowing a worker to have one GET at a time would force the scheduler to move on to the next "free" worker, that is not currently occupied with doing a GET. And then it could choose worker ranking based on storage utilisation. NOT EASY, but this would kind of capture the "don't send ALL sectors to the new worker", but still spread out the load. Lastly, I think it would be optimal if the SP could choose between strategy, like we can on the current "assigner spread". In an ideal world, have, "utilization", "pread" and maybe combined with a concurrent GET limiter, so it automatically is forced to spread out the load, rather than only hitting the same path. It's not a problem that it wants to fill up the new empty storage, we just don't want it scheduling 30 GETs against a single worker, while the rest are idle. Not an easy ask I know, but this will have HUGE impact on storage efficiency and possibly remove current misfit network storage strategies. Basically making it much easier for SPs to move beyond direct attach storage without the pitfalls of using network storage with lotus, which is very hard to do well. |
Is this not supposed to already work? Using |
@clinta It would certainly make sense to load balance with the GET limiter, but I'm quite certain that functionality has issues in lotus as well. @rjan90 can confirm that we see GET limitations not getting enforced: #9213 (comment) Also I used to run our workers with the flag |
Experiments is #10356 may make this bettter |
Just ran the above test with the
It shows a quite good improvement over the current |
With the
|
Is the |
As far as I understand its not possible without rewriting the whole scheduler - which we are working towards with in the [EPIC] Lotus Miner v2 - External task queue milestone. . These new experimental assigners can be seen as a mitigation to get storage only lotus-workers to actually work in systems, while we work towards the task queue.
Yeah, you might see multiple sectors hitting the same path with the |
Whether to consider adding another mode, when the worker is idle, take the initiative to claim the task, which can reduce a lot of work of the miner. |
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
When running a lotus-worker with all flags on false - so we just have these tasks;
and just have long-term storage attached - get mass assigned GETs, even though there are free workers with more free diskspace.
Logging Information
Repo Steps
The text was updated successfully, but these errors were encountered: