fix job consume #1526

psychocrypt · 2018-05-01T18:52:35Z

~~fix #1505~~
~~fix or reduce #1337 #1222~~

remove switch_work in all classes minethd
move consume_work into globalStates
remove the limit that each worker thread must be consume a job before we can receive a new job.
miner should not freeze if one GPU is hanging

In the previous implementation it was possible that a worker thread consumed and already old job if the pool is sending jobs faster than a batch of nonces is calculated. The reson is that the current implementation force that each worker must consume the oldest job in the queue before a new job can be added. On a GPU the time to process a job (a batch of nonces) before checking again if the job has changed much longer than on a CPU. A GPU needs typical ~1 sec+ (slow GPUs sometimes 10sec).

example of the PR behavior if the job is faster changed than processed:

1. job switch
1. thread consume
1. job switch
1. job switch again
1. thread is finishing II. and never consume III

Test

CPU
AMD
NVIDIA

[update:] I thought there is a race condition and deadlock in the code I changes. After discussions with @fireice-uk it looks like I was wrong. Therefore I updated the PR description.

fireice-uk · 2018-05-02T14:58:33Z

This isn't correct. Consume needs to be per thread. Please explain in numbered steps how you think a deadlock can happen.

fireice-uk · 2018-05-02T15:02:21Z

Also please note that a deadlock in worker threads wouldn't affect the network thread. I see some people already think it is some kind of magical client-side fix for nicehash having shitty software.

With regards to nicehash, I think we should silence them like cast-xmr but make it optional with default to false.

fireice-uk

You don't improve code by having less protections.

psychocrypt · 2018-05-02T18:41:41Z

I updated my PR description because I was wrong with the deadlock part.

fireice-uk · 2018-05-04T09:37:42Z

THIRD-PARTY-LICENSES

@@ -5,7 +5,7 @@ This application bundles the following third-party software in accordance with t
 Package: Original NVidia mining code
 Authors: tsiv and KlausT
 License: GNU GPLv3
-Notes: Improvements are (c) of Xmr-Stak team
+Notes: Improvements are (c) of Xmr-Stak team team and are covered by GNU GPLv3


fireice-uk · 2018-05-04T09:39:07Z

THIRD-PARTY-LICENSES

+
+Package: cpputil
+Authors: Will Zhang
+Source: https://github.com/willzhang4a58/cpputil


Nice to see. Big thanks for dealing with licensing and big thanks to @willzhang4a58 for the code.

fireice-uk · 2018-05-04T09:40:29Z

xmrstak/cpputil/read_write_lock.h

+ */
+
+#ifndef CPPUTIL_READ_WRITE_LOCK_H_
+#define CPPUTIL_READ_WRITE_LOCK_H_


Can we get this to the standard stak format -> tab indent and #pragma once

fireice-uk · 2018-05-04T09:55:13Z

xmrstak/backend/globalStates.cpp

+	jobLock.WriteLock();
+
+	// this notifies all threads that the job has changed
+	iGlobalJobNo++; 


This is in a wrong place. The order that I told you:

when a global job is being written you:
acquire write lock

write the job
release the lock
increase the jobno

when a thread gets a job:
acquire read lock
copy the job
set the job no to global (!!! note the difference with write lock)
release read lock

Is slightly more efficient since the threads only start to read when the resource is available

psychocrypt · 2018-05-04T18:34:44Z

updated

fireice-uk · 2018-05-05T10:43:38Z

Needs a rebase before testing

fix fireice-uk#1505 - fix possible deadlock of the executor thread - fix racecondition during the job consumation - remove switch_work in all classes `minethd` - move `consume_work` into `globalStates`

add log class from Will Zhang: Package: cpputil Source: https://github.com/willzhang4a58/cpputil License: MIT License

user read write locks to be sure that no job is consumend during the job update

- reformat `read_write_lock.h` - fix spelling issue - move job id increase of the write to the buttom

psychocrypt · 2018-05-16T15:33:19Z

I removed the BUG label. The point that all worker must consume a job before a new job can be added for the worker is no real bug. A drawback of this restriction can be that the miner is mining old jobs even if newer jobs are received but I do not know if this can be defined as BUG.

psychocrypt added bug backend nvidia backend cpu backend amd labels May 1, 2018

psychocrypt assigned fireice-uk May 1, 2018

psychocrypt requested a review from fireice-uk May 1, 2018 18:52

Balzhur mentioned this pull request May 2, 2018

[xmr-stak]: supposed fix for famous "Share above target" reject nioroso-x3/xmr-stak-power#13

Closed

fireice-uk requested changes May 2, 2018

View reviewed changes

psychocrypt changed the title ~~fix job consume (possible deadlock)~~ fix job consume May 2, 2018

psychocrypt force-pushed the fix-jobConsume branch from 8a7404d to 10b42a5 Compare May 3, 2018 18:45

fireice-uk requested changes May 4, 2018

View reviewed changes

psychocrypt force-pushed the fix-jobConsume branch from 10b42a5 to cde64af Compare May 4, 2018 18:20

psychocrypt force-pushed the fix-jobConsume branch from cde64af to 2e2e189 Compare May 5, 2018 11:11

psychocrypt added 5 commits May 12, 2018 22:29

fix job consume (possible deadlock)

95cc462

fix fireice-uk#1505 - fix possible deadlock of the executor thread - fix racecondition during the job consumation - remove switch_work in all classes `minethd` - move `consume_work` into `globalStates`

add read write lock class

154f2de

add log class from Will Zhang: Package: cpputil Source: https://github.com/willzhang4a58/cpputil License: MIT License

add license header to read_write_lock.h

0aa7498

use read write locks to secure job updates

0f96f51

user read write locks to be sure that no job is consumend during the job update

github annotations

460aa90

- reformat `read_write_lock.h` - fix spelling issue - move job id increase of the write to the buttom

psychocrypt force-pushed the fix-jobConsume branch from 2e2e189 to 460aa90 Compare May 12, 2018 20:53

psychocrypt removed the bug label May 16, 2018

fireice-uk merged commit 3550378 into fireice-uk:dev May 16, 2018

psychocrypt mentioned this pull request May 22, 2018

fix duplicated nonce usage #1592

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix job consume #1526

fix job consume #1526

psychocrypt commented May 1, 2018 •

edited

Loading

fireice-uk commented May 2, 2018

fireice-uk commented May 2, 2018 •

edited

Loading

fireice-uk left a comment

psychocrypt commented May 2, 2018

fireice-uk May 4, 2018

fireice-uk May 4, 2018

fireice-uk May 4, 2018

fireice-uk May 4, 2018

psychocrypt commented May 4, 2018

fireice-uk commented May 5, 2018

psychocrypt commented May 16, 2018

fix job consume #1526

fix job consume #1526

Conversation

psychocrypt commented May 1, 2018 • edited Loading

Test

fireice-uk commented May 2, 2018

fireice-uk commented May 2, 2018 • edited Loading

fireice-uk left a comment

Choose a reason for hiding this comment

psychocrypt commented May 2, 2018

fireice-uk May 4, 2018

Choose a reason for hiding this comment

fireice-uk May 4, 2018

Choose a reason for hiding this comment

fireice-uk May 4, 2018

Choose a reason for hiding this comment

fireice-uk May 4, 2018

Choose a reason for hiding this comment

psychocrypt commented May 4, 2018

fireice-uk commented May 5, 2018

psychocrypt commented May 16, 2018

psychocrypt commented May 1, 2018 •

edited

Loading

fireice-uk commented May 2, 2018 •

edited

Loading