Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix job consume #1526

Merged
merged 5 commits into from
May 16, 2018
Merged

fix job consume #1526

merged 5 commits into from
May 16, 2018

Conversation

psychocrypt
Copy link
Collaborator

@psychocrypt psychocrypt commented May 1, 2018

fix #1505
fix or reduce #1337 #1222

  • remove switch_work in all classes minethd
  • move consume_work into globalStates
  • remove the limit that each worker thread must be consume a job before we can receive a new job.
  • miner should not freeze if one GPU is hanging

In the previous implementation it was possible that a worker thread consumed and already old job if the pool is sending jobs faster than a batch of nonces is calculated. The reson is that the current implementation force that each worker must consume the oldest job in the queue before a new job can be added. On a GPU the time to process a job (a batch of nonces) before checking again if the job has changed much longer than on a CPU. A GPU needs typical ~1 sec+ (slow GPUs sometimes 10sec).

example of the PR behavior if the job is faster changed than processed:

    1. job switch
    1. thread consume
    1. job switch
    1. job switch again
    1. thread is finishing II. and never consume III

Test

  • CPU
  • AMD
  • NVIDIA

[update:] I thought there is a race condition and deadlock in the code I changes. After discussions with @fireice-uk it looks like I was wrong. Therefore I updated the PR description.

@fireice-uk
Copy link
Owner

This isn't correct. Consume needs to be per thread. Please explain in numbered steps how you think a deadlock can happen.

@fireice-uk
Copy link
Owner

fireice-uk commented May 2, 2018

Also please note that a deadlock in worker threads wouldn't affect the network thread. I see some people already think it is some kind of magical client-side fix for nicehash having shitty software.

With regards to nicehash, I think we should silence them like cast-xmr but make it optional with default to false.

Copy link
Owner

@fireice-uk fireice-uk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't improve code by having less protections.

@psychocrypt psychocrypt changed the title fix job consume (possible deadlock) fix job consume May 2, 2018
@psychocrypt
Copy link
Collaborator Author

I updated my PR description because I was wrong with the deadlock part.

@@ -5,7 +5,7 @@ This application bundles the following third-party software in accordance with t
Package: Original NVidia mining code
Authors: tsiv and KlausT
License: GNU GPLv3
Notes: Improvements are (c) of Xmr-Stak team
Notes: Improvements are (c) of Xmr-Stak team team and are covered by GNU GPLv3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

team team


Package: cpputil
Authors: Will Zhang
Source: https://github.com/willzhang4a58/cpputil
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see. Big thanks for dealing with licensing and big thanks to @willzhang4a58 for the code.

*/

#ifndef CPPUTIL_READ_WRITE_LOCK_H_
#define CPPUTIL_READ_WRITE_LOCK_H_
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get this to the standard stak format -> tab indent and #pragma once

jobLock.WriteLock();

// this notifies all threads that the job has changed
iGlobalJobNo++;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a wrong place. The order that I told you:

when a global job is being written you:
acquire write lock

write the job
release the lock
increase the jobno

when a thread gets a job:
acquire read lock
copy the job
set the job no to global (!!! note the difference with write lock)
release read lock

Is slightly more efficient since the threads only start to read when the resource is available

@psychocrypt
Copy link
Collaborator Author

updated

@fireice-uk
Copy link
Owner

Needs a rebase before testing

fix fireice-uk#1505

- fix possible deadlock of the executor thread
- fix racecondition during the job consumation
- remove switch_work in all classes `minethd`
- move `consume_work` into `globalStates`
add log class from Will Zhang:

Package: cpputil
Source: https://github.com/willzhang4a58/cpputil
License: MIT License
user read write locks to be sure that no job is consumend during the job update
- reformat `read_write_lock.h`
- fix spelling issue
- move job id increase of the write to the buttom
@psychocrypt
Copy link
Collaborator Author

I removed the BUG label. The point that all worker must consume a job before a new job can be added for the worker is no real bug. A drawback of this restriction can be that the miner is mining old jobs even if newer jobs are received but I do not know if this can be defined as BUG.

@fireice-uk fireice-uk merged commit 3550378 into fireice-uk:dev May 16, 2018
@psychocrypt psychocrypt mentioned this pull request May 22, 2018
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants