Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About FPU control word settings in 32-bit environment #12125

Closed
SakiTakamachi opened this issue Sep 5, 2023 · 44 comments
Closed

About FPU control word settings in 32-bit environment #12125

SakiTakamachi opened this issue Sep 5, 2023 · 44 comments

Comments

@SakiTakamachi
Copy link
Member

Description

It was discovered in this issue:
#12076

Here is the sqlite forum conversation:
https://www.sqlite.org/forum/forumpost/abbb95376ec6cd5f

In summary, even in an environment where the CPU can handle extended double-precision floating point numbers, in-line assembly may limit the use of normal double-precision numbers.

Here is the code:

_xpfpa_fpu_cw = (_xpfpa_fpu_oldcw & ~0x100) | 0x200; \

I think it should be fixed, but should this be fixed in a patch version, or should it be in a minor version and accompanied by an RFC?

Thank you.

PHP Version

php8.2.9

Operating System

i386/alpine docker image

@dstogov
Copy link
Member

dstogov commented Sep 7, 2023

XPFPA_SWITCH_DOUBLE() does exactly what expected - limits the floating point precision to double.

As I understood, the problem occurs because SQLite introduced a faster algorithm that uses long double, but it works incorrectly when precision is limited. (May be I'm wrong).

We can't change this precision because this may cause fragmentation between x86 and other systems. x86_64 uses SSE2 for double that doesn't support long double precision at all.

I suppose we have to use sqlite3_test_control(SQLITE_TESTCTRL_USELONGDOUBLE,0); solution

@iluuu1994 could you please take a look

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 7, 2023

@dstogov

XPFPA_SWITCH_DOUBLE() does exactly what expected - limits the floating point precision to double.

As I understood, the problem occurs because SQLite introduced a faster algorithm that uses long double, but it works incorrectly when precision is limited. (May be I'm wrong).

I am of the same understanding. It seems I didn't convey it well, sorry.

According to the sqlite forum, sqlite3_test_control is explained as follows, so it seems doubtful whether it can be a solution.

The SQLITE_TESTCTRL_USELONGDOUBLE sqlite3_test_control() can be used to change algorithms on-the-fly. But that interface is specifically for testing purposes only. You should not use it in your application. The behavior of that interface is subject to change without notice.

@iluuu1994
Copy link
Member

iluuu1994 commented Sep 13, 2023

Disclaimer that my knowledge of both IEEE 754 and SQLite is limited. Reading https://www.sqlite.org/forum/forumpost/023baad0fdacea43 it sounds like SQLite does not expect FPU precision to be changed, nor would they like to support it, as they consider it a bug on our side (https://www.sqlite.org/forum/forumpost/abc65bec19475130).

I'm not sure how to fix that, other than 1. not changing the FPU mode which would lead to inconsistent results in PHP, 2. switching back for every call into SQLite.

Also worth noting, they say:

Do I understand correctly that you cannot reproduce the problem without the use of in-line ASM?

But _FPU_SETCW when available does not require in-line assembly. There's also _controlfp_s on Windows and fpsetprec on FreeBSD. So I don't think this requires assembly, the assembly is there as a fallback when these APIs aren't available.

Edit: Also, thank you @SakiTakamachi for your extensive analysis. 🙂

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 13, 2023

Yeah, it seems like the attitude is that they won't accept anything other than the highest precision that the CPU can handle.

I think the only practical way is to temporarily change the precision before and after using the sqlite API.

This is probably the same as idea 2.

@iluuu1994
Copy link
Member

iluuu1994 commented Sep 13, 2023

@SakiTakamachi Maybe they will reconsider if aware that there is an API on all major platforms? Maybe let them know in the thread you opened. And it would be great if you could verify in your short reproducer.

@SakiTakamachi
Copy link
Member Author

OK, I'll post it in the thread.

@iluuu1994
Copy link
Member

@SakiTakamachi Thank you! You should mention _FPU_SETCW on Linux.

@SakiTakamachi
Copy link
Member Author

@iluuu1994
Oops, I've edited.

@SakiTakamachi
Copy link
Member Author

@iluuu1994

Although it is not definitive as the final conclusion has not yet been reached, it appears from the timeline that a fix for this issue has been adopted.

https://sqlite.org/src/timeline?c=aa999d490b743f45&y=a

@iluuu1994
Copy link
Member

@SakiTakamachi That's great, thank you! /cc @andypost

@andypost
Copy link
Contributor

Thank you! it means it just need to wait for new sqlite release)

btw I got some reflection of x86 FPU because few tests still fail on x86 (32 bit) https://gitlab.alpinelinux.org/alpine/aports/-/issues/11645#note_104896
The list is https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/community/php82/disabled-tests.x86.list

There was an attempt https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/28340/diffs

@SakiTakamachi
Copy link
Member Author

@andypost
Thank you.

In which php version does the test fail? In some cases, it may be easier to understand by opening a new issue.

@andypost
Copy link
Contributor

@SakiTakamachi it fails few years, probably starting from 7.4... can't recall exactly - I just disable this 32-bit tests long ago as it became clear that issue not in musl(

I gonna enable them to get fresh run, IIRC few math tests was broken on s390x but over time they already fixed

@SakiTakamachi
Copy link
Member Author

@andypost

Ah, I see now. I'll take a look.

@SakiTakamachi
Copy link
Member Author

I took a look, and it seems that math.h is also involved, so it's not a simple problem.

Perhaps cos() also seems to ignore the FPU value....

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 16, 2023

@andypost

First, I took a closer look at ext/standard/tests/math/cos_basic.phpt.

test code:

#include <stdio.h>
#include <math.h>

int main()
{
    unsigned int oldcw, cw;
    __asm__ __volatile__ ("fnstcw %0" : "=m" (*&oldcw)); \
    cw = (oldcw & ~0x100) | 0x200; \
    __asm__ __volatile__ ("fldcw %0" : : "m" (*&cw)); \

    double x;
    x = 3.1415926535897931;

    printf("cos(%.16f) = %.16f", x, cos(x));
    return 0;
}

alpine 32bit result:

cos(3.1415926535897932) = -1.0000000002522274

debian 32bit (i386/debian:trixie-slim) result:

cos(3.1415926535897931) = -1.0000000000000000

From these results, I infer that the problem lies in alpine's math.h or musl.

@andypost
Copy link
Contributor

As I see the issue is in printf() because even adding

/mnt # cat mm.c 
#include <stdio.h>
#include <math.h>

int main()
{
    double x = 3.1415926535897931;

    printf("pi=%.17f %.16f \n", x, x);

    unsigned int oldcw, cw;
    __asm__ __volatile__ ("fnstcw %0" : "=m" (*&oldcw)); \
    cw = (oldcw & ~0x100) | 0x200; \
    __asm__ __volatile__ ("fldcw %0" : : "m" (*&cw)); \

    printf("pi=%.17f %.16f \n", x, x);

    return 0;
}
/mnt # gcc mm.c 
/mnt # ./a.out 
pi=3.14159265358979312 3.1415926535897931 
pi=3.14159265358979312 3.1415926535897932 

@andypost
Copy link
Contributor

Maybe 32-bit just need something like #5621

@SakiTakamachi
Copy link
Member Author

What's wrong with this?

3.1415926535897931, 3.1415926535897932, and 3.14159265358979312 are all 0x400921FB54442D18 in IEEE754 and have the same value. It's just that the decimal representation is different.

@andypost
Copy link
Contributor

then why without math the last number displayed 2 at the end

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 22, 2023

Do you mean it's strange to round ....312 to ....32, right?
Floating point calculations are calculated differently from our human senses.

After rounding, display the "nearest decimal number" at the end.
To begin with, the last digit of 3.14159265358979312 has no reliability. This number itself contains an error because it is displayed as a decimal number.

Most floating point numbers have guaranteed precision to the upper 15 digits. The remaining two digits are visible but unreliable.
Therefore, it is pointless to seek mathematical correctness for the last two digits.


As I wrote above, the IEEE754 representation of these three values ​​is 0x400921FB54442D18.
All values ​​are the same, only the display is different.

It may appear that 3.14159265358979312 is rounded to 3.1415926535897932, but the rounding calculation itself is probably not performed in the first place, and the nearest decimal number of the "specified number of digits" is simply displayed using the "current FPU precision".

In fact, I don't think any rounding is being performed, because no matter how I change the rounding mode under FPU control, the result is the same.

@andypost
Copy link
Contributor

Thank you a lot for elaboration! I mentioned musl maintainer about this issue and examples

@SakiTakamachi
Copy link
Member Author

I suspect the problem is with math.h and its related implementation of cos().

I'm very interested in this issue, so if you have any progress, please let me know.

@andypost
Copy link
Contributor

Probably it could be better to write to musl mailing list but maybe @richfelker can comment here

@richfelker
Copy link

Calling any function from the std library without satisfying the ABI (that fpu control word is in a state that gives conforming IEEE floating point semantics) yields undefined behavior. I believe I reported this bug in PHP a long time ago but it was ignored...

@andypost
Copy link
Contributor

@andypost
Copy link
Contributor

andypost commented Sep 22, 2023

btw I applied patch and all this tests passed https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/52108/diffs

@richfelker
Copy link

AFAICT that's just disabling the use of custom fpu control word, undermining whatever php was trying to achieve with it. While that use is almost certainly wrong (probably trying to use the low-precisions modes, that don't actually behave like IEEE types, and which induce some nasty UB-like effects), getting rid of it might produce new failures elsewhere. The "right" thing to do is to only switch into the nonconforming mode for execution of your own code that expects the fpu in this state, and to always switch back before returning to (or calling into) code that expects the ABI requirements to be met.

@SakiTakamachi
Copy link
Member Author

In any case, all of the remaining issues in this thread are not caused by php.

We can open a new issue if we want, so I close this issue.

@andypost
Copy link
Contributor

@SakiTakamachi issue is caused by PHP which is using control in a wrong way, so I have to disable FPU usage to pass tests

@andypost
Copy link
Contributor

Same as https://bugs.php.net/bug.php?id=79595 it just hide the problem

@SakiTakamachi
Copy link
Member Author

It seems that alpine relies on long double behavior.

I don't think it's a php issue.

@SakiTakamachi
Copy link
Member Author

@iluuu1994 @dstogov

What do you think?

@andypost
Copy link
Contributor

andypost commented Oct 5, 2023

Here's PR #12368 to backport the fix for 32-bits, at least it allows to pass tests

@SakiTakamachi
Copy link
Member Author

@iluuu1994

I confirmed that the problem was resolved with SQLite3.43.2!

@iluuu1994
Copy link
Member

@SakiTakamachi Great! Thank you for confirming!

@richfelker
Copy link

This issue cannot be "resolved" by changing the sqlite version. If the symptom went away, that just means it's re-buried, not resolved. It will not be resolved until PHP either stops fiddling with the FPU control word, or swaps the janky control word value in and out every time it switches between running its own code and calling external library (including libc) code.

@SakiTakamachi
Copy link
Member Author

PHP either stops fiddling with the FPU control word

It makes sense to control the FPU if the OS supports that feature. However, I personally think that in-line ASM may be controversial.

swaps the janky control word value in and out every time it switches between running its own code and calling external library (including libc) code

Honestly, I don't think it's realistic because such places exist everywhere in PHP.

@richfelker
Copy link

It makes sense to control the FPU

It does not make sense on at least two levels, one deeply conceptual, and one very practical.

On the very high conceptual level, making calls with the FPU control word set to something contrary to the ABI requirements voids all contracts you're expecting to be satisfied by the functions you're calling. Everything has undefined behavior. It might sometimes appear to work, or "mostly work", but you have no promises from anyone that anything will work. If you do this and rely on it working, you're doing coding by trial and error on particular versions of particular platforms you think you understand, and it may break at any time.

On the practical level, setting the FPU control word is not doing what you think it is. Folks set it the way PHP is setting it in hopes of getting IEEE double behavior on the x87 fpu. Unfortunately, this is not possible. The "double" mode the x87 has is not actually IEEE double, but carries extra exponent range and extra bits of precision on denormals that should not be there, that will get lost in a hidden double-rounding step whenever an intermediate is spilled to memory rather than being kept in a register. This kind of inconsistent value breaks the assumptions compiler optimizations are built on and can produce all the standard UB effects. But even if nothing does blow up, it's not giving you the exact same results you'd get on a machine with actual IEEE doubles, which is presumably what whoever did this wanted.

if the OS supports that feature.

The OS does not support that feature when calling library functions that have not given you a contract that they'll accept it. It supports letting you set it for your own code, but the compiler doesn't actually support this (see above), so you will get UB-like results sometimes.

However, I personally think that in-line ASM may be controversial.

As far as I'm concerned, being inline assembly is completely non-controversial. Assembly is really the only way you can make use of this feature without bad interactions with the compiler.

Honestly, I don't think it's realistic because such places exist everywhere in PHP.

I don't think you are prepared to evaluate that unless you understand what it's used for and where it's wanted.

@SakiTakamachi
Copy link
Member Author

@dstogov

Do you know what the FPU control word is currently used for and what will happen if it is abolished?

As far as I know, it is described in the following RFC for round.

https://wiki.php.net/rfc/rounding

@dstogov
Copy link
Member

dstogov commented Oct 16, 2023

Do you know what the FPU control word is currently used for and what will happen if it is abolished?

We use SSE floating point instruction on X86_64 and regular FPU in x86 (32-bit). SSE precision is limited to 64-bit doubles while FPU may use 80-bit long double.
I suppose, we limited precision of FPU to unify the behaviour of tests in both environments.

@SakiTakamachi
Copy link
Member Author

@dstogov

Thank you. I studied it and was able to understand it to some extent.

@richfelker
Copy link

SSE precision is limited to 64-bit doubles while FPU may use 80-bit long double.
I suppose, we limited precision of FPU to unify the behaviour of tests in both environments.

As I tried to explain above, the behavior is not actually unified. The x87 "64-bit precision" mode is not actually 64-bit and doesn't match IEEE double behavior. The in-register form has 53 mantissa bits like double, but 15 exponent bits like 80-bit extended. So to begin with, this is a different type with different results.

When these frankenstein floats are spilled to memory, however, things get even more messy. The excess exponent range is collapsed at store time by another rounding operation, producing an inconsistent value depending on whether intermediates are kept in registers or need to be saved to the stack and reloaded (e.g. across function calls or if there's too much register pressure).

There is really no good way to get safe and standard double precision evaluation on x86 without either requiring SSE2+ or using soft-float or some complex hybrid soft/hard method that patches up each result.

It probably makes sense to try to figure out what properties you're actually trying to rely on. If there's a hosted-application-facing reason you're trying to match IEEE double semantics perfectly, you're not doing that now, and fixing it is really hard and costly. But if you're just trying to make tests come out as "pass" without arch-specific result expectations, the answer is probably just "don't do that". You get the safest behavior by processing expressions as double_t (which is long double on 32-bit x86, double most other archs) and using double just as a "storage type".

@SakiTakamachi
Copy link
Member Author

First of all, let me state that these are my personal opinions.

I personally think that FP is mainly used in two places, the output system and the calculation system.

Since the output system is "closed" within php, there is probably no need to consider it in this case, and the use of external libraries in the calculation system may be the subject of this discussion.

If the calculation system uses an external library and is a 32-bit architecture, it may be relatively easy to temporarily restore FPU control.

However, I don't know exactly what the side effects of such a change would be.

Such a change would likely require an RFC, but I have no idea how well it would be received.

Really FPU issues bother us. There is no doubt about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants