-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sceKernelUnlockMutex timing #15930
Conversation
That seems fine. |
Hm, when a mutex is unlocked it should run another thread. I'm worried this might not be delaying the right thread, even if it appears to work. Does the game also work if you eat cycles here instead of doing a thread delay? The log only shows when user_main returns, but presumably the mutex unlock woke up some other thread, which did stuff. This could actually mean that the other thread called some other function that should've taken longer. -[Unknown] |
Hm, that's a good point. EatCycles might make more sense. I do think that mutex unlocks are gonna take a bunch of cycles either way, definitely not zero-cost, and in 150 cycles, I think there's not really enough time to switch to another thread and do something that blocks, and surely mutex unlocks are not completely free. Just the cost of switching threads twice has to be up there around 150, what with all the register saving and loading... (though not sure how well we account for that). |
I think invalid address is due to "CRI FS Data Decompression 0" thread |
We do account for that in thread switching based on some tests. This delay here is actually 33300 cycles, depending on CPU speed (it's delaying the thread for 150 microseconds.) It's true this function might take insufficient time. My bigger concern is that this doesn't seem like it could possibly be handling the correct behavior of a better-priority thread unlocking a mutex that a worse-priority thread is waiting on. This should not delay the thread that unlocked the mutex, and it should not reschedule to the worse-priority thread. Lots of tests confirmed this. Unless I'm missing something, this change makes that happen. I'm actually surprised this passes our pspautotests, so maybe hleDelayResult() is buggy in some way that happens to work out here mostly when combined with hleReSchedule(). Worried this might break Radiant Mythology or something. -[Unknown] |
I created a quick test (will send to pspautotests), which does show this behavior is incorrect. Simply set flags: 1 for worse-priority thread, 2 for same-priority thread, and 4 for better-priority thread. --- PSP output
+++ PPSSPP output (after #15930)
@@ -1,3 +1,3 @@
-[x/] Unlock test
-[x/157] Unlocked, ran: 4
-[r/322] Delayed, ran: 7
+[x/] Unlock test
+[r/180] Unlocked, ran: 7
+[r/250] Delayed, ran: 7 On a real PSP, the entire unlock process (including running the same-priority thread) does take 157us total. It does not reschedule for anything but the better-priority thread (4) and returns to the current thread before executing any other same-priority thread (which is why it's not 6, which is 4 + 2, and also why it shows 'x' not 'r' before the time.) However, this change makes it run all the threads (same and worse priority) immediately on unlock, which is incorrect and not how mutexes are supposed to work (just generally, even outside PSP's threadman library.) Before this change: --- PSP output
+++ PPSSPP output (before #15930)
@@ -1,3 +1,3 @@
-[x/] Unlock test
-[x/157] Unlocked, ran: 4
-[r/322] Delayed, ran: 7
+[x/] Unlock test
+[x/49] Unlocked, ran: 4
+[r/250] Delayed, ran: 7 Indeed, the unlock is too fast and should take longer. However, outside the time, the results were correct before as far as which threads were scheduled. -[Unknown] |
Also, here's another run recording more detail for timing, minimizing overhead noise that could make it wrong:
This means the unlock "completed" in 48 microseconds, by which point the better-priority thread already ran. The unlock was complete after 83 microseconds (not including overhead of Without this PR, the results are: --- PSP results
+++ PPSSPP results (before #15930)
@@ -2,8 +2,8 @@
[x] Unlocked, ran: 4
[r] Delayed, ran: 7
Times:
- better: 48
- unlock done: 83
- same: 182
- worse: 250
- done: 384
+ better: 13
+ unlock done: 27
+ same: 72
+ worse: 98
+ done: 275 Using These were the changes (not committing because we'll never match exact times, so it makes for a useless test): diff --git a/tests/threads/mutex/unlock2.cpp b/tests/threads/mutex/unlock2.cpp
index 45b3513..37b7492 100644
--- a/tests/threads/mutex/unlock2.cpp
+++ b/tests/threads/mutex/unlock2.cpp
@@ -5,6 +5,7 @@ extern "C" {
struct ThreadArg {
SceUID id;
int bits;
+ u64 unlockTime;
};
struct KernelThread {
@@ -18,7 +19,7 @@ struct KernelThread {
}
template <typename T>
- void Start(T &arg) {
+ void Start(T *arg) {
sceKernelStartThread(id_, sizeof(arg), &arg);
}
@@ -28,10 +29,11 @@ struct KernelThread {
static int threadsRun = 0;
static int waitLock(SceSize argSize, void *argPointer) {
- ThreadArg *arg = (ThreadArg *)argPointer;
+ ThreadArg *arg = *(ThreadArg **)argPointer;
SceUInt timeout = 1000;
schedulingResult = sceKernelLockMutex(arg->id, 1, &timeout);
threadsRun |= arg->bits;
+ arg->unlockTime = sceKernelGetSystemTimeWide();
sceKernelUnlockMutex(arg->id, 1);
return 0;
}
@@ -53,11 +55,11 @@ extern "C" int main(int argc, char **argv) {
}
ThreadArg worseArg = { id, 1 };
- worseThread.Start(worseArg);
+ worseThread.Start(&worseArg);
ThreadArg sameArg = { id, 2 };
- sameThread.Start(sameArg);
+ sameThread.Start(&sameArg);
ThreadArg betterArg = { id, 4 };
- betterThread.Start(betterArg);
+ betterThread.Start(&betterArg);
if (threadsRun != 0) {
printf("Test failure: threads not blocked by lock\n");
@@ -69,12 +71,22 @@ extern "C" int main(int argc, char **argv) {
//CHECKPOINT_ENABLE_TIME = 1;
checkpointNext("Unlock test");
+ u64 beforeUnlock = sceKernelGetSystemTimeWide();
sceKernelUnlockMutex(id, 1);
int immediately = threadsRun;
+ u64 unlocked = sceKernelGetSystemTimeWide();
checkpoint(" Unlocked, ran: %x", immediately);
sceKernelDelayThread(200);
+ u64 done = sceKernelGetSystemTimeWide();
checkpoint(" Delayed, ran: %x", threadsRun);
+ schedf("Times:\n");
+ schedf(" better: %lld\n", betterArg.unlockTime - beforeUnlock);
+ schedf(" unlock done: %lld\n", unlocked - beforeUnlock);
+ schedf(" same: %lld\n", sameArg.unlockTime - beforeUnlock);
+ schedf(" worse: %lld\n", worseArg.unlockTime - beforeUnlock);
+ schedf(" done: %lld\n", done - beforeUnlock);
+
sceKernelDeleteMutex(id);
return 0; -[Unknown] |
hleEatCycles(6000); or hleEatCycles(33300); also don't work |
It's probably that something else should be causing the other thread to run before the crash happens, but sceKernelUnlockMutex isn't the right function. It must be some other thing before the crash. The PSP OS just runs the best-priority thread at all times, as long as it's not sleeping/waiting. So it would be good to compare the priority of the thread which called sceKernelUnlockMutex and the thread that needs to run to prevent the crash. Lower values are better. If the thread that needs to run has a better (lower value) priority than the current thread, it should've run without this change. -[Unknown] |
Fix #6557
Match Real PSP log
20:29:15.097838 user_main - -> sceKernelUnlockMutex 0x3EFE17F, 0x1 = 0x0
20:29:15.097979 user_main - <- sceKernelUnlockMutex 0x3EFE17F, 0x1 = 0x0