Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET_xx_MAX_CONCURRENT env variable is not being enforced #9407

Closed
8 of 18 tasks
Tracked by #10338
rjan90 opened this issue Oct 3, 2022 · 3 comments · Fixed by #10356
Closed
8 of 18 tasks
Tracked by #10338

GET_xx_MAX_CONCURRENT env variable is not being enforced #9407

rjan90 opened this issue Oct 3, 2022 · 3 comments · Fixed by #10356

Comments

@rjan90
Copy link
Contributor

rjan90 commented Oct 3, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon:  1.17.2-rc2+mainnet+git.3d23a54fe+api1.5.0
Local: lotus version 1.17.2-rc2+mainnet+git.3d23a54fe

Describe the Bug

The GET_XX_MAX_CONCURRENT= enviorment variable is not being enforced between a AP-worker and a PC1-worker. I also tried exporting all the different GET_size_MAX_CONCURRENT enviroments, just to rule out if it was similar to this issue where the sealer doesn't set the sector seal proof type on the scheduler request.

Logging Information

Repo Steps

  1. Run two seperate workers:
  • One with AP eneabled and all else tasks disabled.
  • The second with AP disabled, and all else tasks enabled.
  1. Set GET_sectorSize_MAX_CONCURRENT=2 on the second worker
  2. Pledge 3-4 CC-sectors to get multiple GET-tasks after AP is done.
  3. See that the GET_sectorSize_MAX_CONCURRENT=2 is not being enforced when the tasks is being transferred to the worker doing everything else apart from AP.
@magik6k
Copy link
Contributor

magik6k commented Feb 27, 2023

This is probably because resources in the preparing step are accounted separately from the executing step

Easiest fix is sharing task counters between preparing/executing resource windows

@rjan90
Copy link
Contributor Author

rjan90 commented Feb 28, 2023

Scheduled 3 CC-sectors on a AP-worker, with a remote PC1 worker with GET_32G_MAX_CONCURRENT=1 to check if the limit was enforced on branch lotus version 1.21.0-dev+mainnet+git.2316363f7.
3 APs running

sealworker1@sealworker1-RS500A-E10-RS4:~$ lotus-miner sealing jobs
ID        Sector  Worker    Hostname     Task  State    Time
3f132731  5076    7ea9023f  APworker1  AP    running  3m51.1s
4b1886f9  5077    7ea9023f  APworker1  AP    running  3m50s
f89828eb  5078    7ea9023f  APworker1  AP    running  3m48.9s

But still seeing all GETs running even when the limit was set to 1:

lotus-miner sealing jobs
ID        Sector  Worker    Hostname  Task  State    Time
369ac0b2  5076    186f382d  misty     GET   running  11.9s
73b6ab7c  5078    186f382d  misty     GET   running  10.6s
e00fd02d  5077    186f382d  misty     GET   running  10.6s

And

lotus-miner sealing workers
Worker 186f382d-16b9-4a97-b6f1-4a32b06e9c08, host misty
	TASK: PC1(3) 
	CPU:  [                                                                ] 0/128 core(s) in use
	RAM:  [|                                                               ] 1% 10.59 GiB/995.6 GiB
	VMEM: [|                                                               ] 1% 10.59 GiB/995.6 GiB
	GPU:  [                                                                ] 0% 0.00/1 gpu(s) in use
	GPU: NVIDIA RTX A2000, not used
Worker 7ea9023f-2635-4265-b743-40a4e670582d, host APworker
	CPU:  [                                                                ] 0/16 core(s) in use
	RAM:  [|                                                               ] 2% 11.21 GiB/503.7 GiB
	VMEM: [|                                                               ] 2% 11.21 GiB/503.7 GiB
	GPU:  [                                                                ] 0% 0.00/1 gpu(s) in use
	GPU: GeForce RTX 2080 Ti, not used

@rjan90 rjan90 added this to the v1.21.0 milestone Feb 28, 2023
@rjan90
Copy link
Contributor Author

rjan90 commented Feb 28, 2023

Working with the last commit c484c38.

Scheduled 3 CC-sectors on the AP-worker, with the remote PC1 worker with GET_32G_MAX_CONCURRENT=1.

lotus-miner sealing jobs
ID        Sector  Worker    Hostname  Task  State    Time
0827416c  5083    a1d0f466  APworker  AP    running  3m25.3s
6babe413  5084    a1d0f466  APworker  AP    running  3m18.9s
174b384d  5085    a1d0f466  APworker  AP    running  3m12.3s

When those three finished, it now properly limits the GETs, while the others are waiting to be assigned for PC1 which requires GET to the PC1-worker:

lotus-miner sealing jobs
ID        Sector  Worker    Hostname  Task  State        Time
32e5ee03  5083    5a247609  misty     GET   running      16.2s
00000000  5084    5a247609  misty     PC1   assigned(1)  3.7s
00000000  5085    5a247609  misty     PC1   assigned(1)  300ms

Can also see the limit working in the lotus-miner sealing workers cmd

lotus-miner sealing workers
Worker 5a247609-87d8-41a7-bc82-5543f3156601, host misty
	TASK: GET(1/1) 
	CPU:  [                                                                ] 0/128 core(s) in use
	RAM:  [|                                                               ] 1% 10.72 GiB/995.6 GiB
	VMEM: [|                                                               ] 1% 10.72 GiB/995.6 GiB
	GPU:  [                                                                ] 0% 0.00/1 gpu(s) in use
	GPU: NVIDIA RTX A2000, not used
Worker a1d0f466-1920-4d78-90c6-1d3e81e70840, host APworker
	CPU:  [                                                                ] 0/16 core(s) in use
	RAM:  [|                                                               ] 2% 11.14 GiB/503.7 GiB
	VMEM: [|                                                               ] 2% 11.14 GiB/503.7 GiB
	GPU:  [                                                                ] 0% 0.00/1 gpu(s) in use
	GPU: GeForce RTX 2080 Ti, not used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants