About FPU control word settings in 32-bit environment #12125

SakiTakamachi · 2023-09-05T00:36:58Z

Description

It was discovered in this issue:
#12076

Here is the sqlite forum conversation:
https://www.sqlite.org/forum/forumpost/abbb95376ec6cd5f

In summary, even in an environment where the CPU can handle extended double-precision floating point numbers, in-line assembly may limit the use of normal double-precision numbers.

Here is the code:

php-src/Zend/zend_float.h

Line 356 in e2189e5

_xpfpa_fpu_cw = (_xpfpa_fpu_oldcw & ~0x100) | 0x200; \

I think it should be fixed, but should this be fixed in a patch version, or should it be in a minor version and accompanied by an RFC?

Thank you.

PHP Version

php8.2.9

Operating System

i386/alpine docker image

dstogov · 2023-09-07T10:48:33Z

XPFPA_SWITCH_DOUBLE() does exactly what expected - limits the floating point precision to double.

As I understood, the problem occurs because SQLite introduced a faster algorithm that uses long double, but it works incorrectly when precision is limited. (May be I'm wrong).

We can't change this precision because this may cause fragmentation between x86 and other systems. x86_64 uses SSE2 for double that doesn't support long double precision at all.

I suppose we have to use sqlite3_test_control(SQLITE_TESTCTRL_USELONGDOUBLE,0); solution

@iluuu1994 could you please take a look

SakiTakamachi · 2023-09-07T10:55:12Z

@dstogov

XPFPA_SWITCH_DOUBLE() does exactly what expected - limits the floating point precision to double.

As I understood, the problem occurs because SQLite introduced a faster algorithm that uses long double, but it works incorrectly when precision is limited. (May be I'm wrong).

I am of the same understanding. It seems I didn't convey it well, sorry.

According to the sqlite forum, sqlite3_test_control is explained as follows, so it seems doubtful whether it can be a solution.

The SQLITE_TESTCTRL_USELONGDOUBLE sqlite3_test_control() can be used to change algorithms on-the-fly. But that interface is specifically for testing purposes only. You should not use it in your application. The behavior of that interface is subject to change without notice.

iluuu1994 · 2023-09-13T13:35:06Z

Disclaimer that my knowledge of both IEEE 754 and SQLite is limited. Reading https://www.sqlite.org/forum/forumpost/023baad0fdacea43 it sounds like SQLite does not expect FPU precision to be changed, nor would they like to support it, as they consider it a bug on our side (https://www.sqlite.org/forum/forumpost/abc65bec19475130).

I'm not sure how to fix that, other than 1. not changing the FPU mode which would lead to inconsistent results in PHP, 2. switching back for every call into SQLite.

Also worth noting, they say:

Do I understand correctly that you cannot reproduce the problem without the use of in-line ASM?

But _FPU_SETCW when available does not require in-line assembly. There's also _controlfp_s on Windows and fpsetprec on FreeBSD. So I don't think this requires assembly, the assembly is there as a fallback when these APIs aren't available.

Edit: Also, thank you @SakiTakamachi for your extensive analysis. 🙂

SakiTakamachi · 2023-09-13T14:09:24Z

Yeah, it seems like the attitude is that they won't accept anything other than the highest precision that the CPU can handle.

I think the only practical way is to temporarily change the precision before and after using the sqlite API.

This is probably the same as idea 2.

iluuu1994 · 2023-09-13T14:10:57Z

@SakiTakamachi Maybe they will reconsider if aware that there is an API on all major platforms? Maybe let them know in the thread you opened. And it would be great if you could verify in your short reproducer.

SakiTakamachi · 2023-09-13T14:13:50Z

OK, I'll post it in the thread.

iluuu1994 · 2023-09-13T14:37:31Z

@SakiTakamachi Thank you! You should mention _FPU_SETCW on Linux.

SakiTakamachi · 2023-09-13T14:42:17Z

@iluuu1994
Oops, I've edited.

SakiTakamachi · 2023-09-15T11:39:28Z

@iluuu1994

Although it is not definitive as the final conclusion has not yet been reached, it appears from the timeline that a fix for this issue has been adopted.

https://sqlite.org/src/timeline?c=aa999d490b743f45&y=a

iluuu1994 · 2023-09-15T11:41:58Z

@SakiTakamachi That's great, thank you! /cc @andypost

andypost · 2023-09-15T11:55:10Z

Thank you! it means it just need to wait for new sqlite release)

btw I got some reflection of x86 FPU because few tests still fail on x86 (32 bit) https://gitlab.alpinelinux.org/alpine/aports/-/issues/11645#note_104896
The list is https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/community/php82/disabled-tests.x86.list

There was an attempt https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/28340/diffs

SakiTakamachi · 2023-09-15T14:20:08Z

@andypost
Thank you.

In which php version does the test fail? In some cases, it may be easier to understand by opening a new issue.

andypost · 2023-09-15T14:34:50Z

@SakiTakamachi it fails few years, probably starting from 7.4... can't recall exactly - I just disable this 32-bit tests long ago as it became clear that issue not in musl(

I gonna enable them to get fresh run, IIRC few math tests was broken on s390x but over time they already fixed

SakiTakamachi · 2023-09-15T14:38:07Z

@andypost

Ah, I see now. I'll take a look.

SakiTakamachi · 2023-09-15T17:21:47Z

I took a look, and it seems that math.h is also involved, so it's not a simple problem.

Perhaps cos() also seems to ignore the FPU value....

SakiTakamachi · 2023-09-16T02:12:19Z

@andypost

First, I took a closer look at ext/standard/tests/math/cos_basic.phpt.

test code:

#include <stdio.h>
#include <math.h>

int main()
{
    unsigned int oldcw, cw;
    __asm__ __volatile__ ("fnstcw %0" : "=m" (*&oldcw)); \
    cw = (oldcw & ~0x100) | 0x200; \
    __asm__ __volatile__ ("fldcw %0" : : "m" (*&cw)); \

    double x;
    x = 3.1415926535897931;

    printf("cos(%.16f) = %.16f", x, cos(x));
    return 0;
}

alpine 32bit result:

cos(3.1415926535897932) = -1.0000000002522274

debian 32bit (i386/debian:trixie-slim) result:

cos(3.1415926535897931) = -1.0000000000000000

From these results, I infer that the problem lies in alpine's math.h or musl.

andypost · 2023-09-22T14:18:47Z

As I see the issue is in printf() because even adding

/mnt # cat mm.c 
#include <stdio.h>
#include <math.h>

int main()
{
    double x = 3.1415926535897931;

    printf("pi=%.17f %.16f \n", x, x);

    unsigned int oldcw, cw;
    __asm__ __volatile__ ("fnstcw %0" : "=m" (*&oldcw)); \
    cw = (oldcw & ~0x100) | 0x200; \
    __asm__ __volatile__ ("fldcw %0" : : "m" (*&cw)); \

    printf("pi=%.17f %.16f \n", x, x);

    return 0;
}
/mnt # gcc mm.c 
/mnt # ./a.out 
pi=3.14159265358979312 3.1415926535897931 
pi=3.14159265358979312 3.1415926535897932

andypost · 2023-09-22T14:19:41Z

Maybe 32-bit just need something like #5621

SakiTakamachi · 2023-09-22T14:34:57Z

What's wrong with this?

3.1415926535897931, 3.1415926535897932, and 3.14159265358979312 are all 0x400921FB54442D18 in IEEE754 and have the same value. It's just that the decimal representation is different.

andypost · 2023-09-22T14:46:23Z

then why without math the last number displayed 2 at the end

SakiTakamachi · 2023-09-22T15:23:02Z

Do you mean it's strange to round ....312 to ....32, right?
Floating point calculations are calculated differently from our human senses.

After rounding, display the "nearest decimal number" at the end.
To begin with, the last digit of 3.14159265358979312 has no reliability. This number itself contains an error because it is displayed as a decimal number.

Most floating point numbers have guaranteed precision to the upper 15 digits. The remaining two digits are visible but unreliable.
Therefore, it is pointless to seek mathematical correctness for the last two digits.

As I wrote above, the IEEE754 representation of these three values is 0x400921FB54442D18.
All values are the same, only the display is different.

It may appear that 3.14159265358979312 is rounded to 3.1415926535897932, but the rounding calculation itself is probably not performed in the first place, and the nearest decimal number of the "specified number of digits" is simply displayed using the "current FPU precision".

In fact, I don't think any rounding is being performed, because no matter how I change the rounding mode under FPU control, the result is the same.

andypost · 2023-09-22T17:47:41Z

Thank you a lot for elaboration! I mentioned musl maintainer about this issue and examples

SakiTakamachi · 2023-09-22T17:55:47Z

I suspect the problem is with math.h and its related implementation of cos().

I'm very interested in this issue, so if you have any progress, please let me know.

andypost · 2023-09-22T18:40:26Z

Probably it could be better to write to musl mailing list but maybe @richfelker can comment here

richfelker · 2023-09-22T19:01:02Z

Calling any function from the std library without satisfying the ABI (that fpu control word is in a state that gives conforming IEEE floating point semantics) yields undefined behavior. I believe I reported this bug in PHP a long time ago but it was ignored...

andypost · 2023-09-22T19:18:47Z

Yes, it was fixed only for x86_64

andypost · 2023-09-22T20:38:02Z

btw I applied patch and all this tests passed https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/52108/diffs

richfelker · 2023-09-23T13:42:12Z

AFAICT that's just disabling the use of custom fpu control word, undermining whatever php was trying to achieve with it. While that use is almost certainly wrong (probably trying to use the low-precisions modes, that don't actually behave like IEEE types, and which induce some nasty UB-like effects), getting rid of it might produce new failures elsewhere. The "right" thing to do is to only switch into the nonconforming mode for execution of your own code that expects the fpu in this state, and to always switch back before returning to (or calling into) code that expects the ABI requirements to be met.

SakiTakamachi · 2023-09-28T06:52:31Z

In any case, all of the remaining issues in this thread are not caused by php.

We can open a new issue if we want, so I close this issue.

andypost · 2023-09-28T08:47:55Z

@SakiTakamachi issue is caused by PHP which is using control in a wrong way, so I have to disable FPU usage to pass tests

andypost · 2023-09-28T08:52:03Z

Same as https://bugs.php.net/bug.php?id=79595 it just hide the problem

SakiTakamachi · 2023-09-28T09:34:13Z

It seems that alpine relies on long double behavior.

I don't think it's a php issue.

SakiTakamachi · 2023-09-28T09:55:44Z

@iluuu1994 @dstogov

What do you think?

andypost · 2023-10-05T19:01:29Z

Here's PR #12368 to backport the fix for 32-bits, at least it allows to pass tests

SakiTakamachi · 2023-10-13T00:13:08Z

@iluuu1994

I confirmed that the problem was resolved with SQLite3.43.2!

iluuu1994 · 2023-10-13T20:38:23Z

@SakiTakamachi Great! Thank you for confirming!

richfelker · 2023-10-13T21:03:33Z

This issue cannot be "resolved" by changing the sqlite version. If the symptom went away, that just means it's re-buried, not resolved. It will not be resolved until PHP either stops fiddling with the FPU control word, or swaps the janky control word value in and out every time it switches between running its own code and calling external library (including libc) code.

SakiTakamachi · 2023-10-14T02:27:57Z

PHP either stops fiddling with the FPU control word

It makes sense to control the FPU if the OS supports that feature. However, I personally think that in-line ASM may be controversial.

swaps the janky control word value in and out every time it switches between running its own code and calling external library (including libc) code

Honestly, I don't think it's realistic because such places exist everywhere in PHP.

richfelker · 2023-10-14T03:06:40Z

It makes sense to control the FPU

It does not make sense on at least two levels, one deeply conceptual, and one very practical.

On the very high conceptual level, making calls with the FPU control word set to something contrary to the ABI requirements voids all contracts you're expecting to be satisfied by the functions you're calling. Everything has undefined behavior. It might sometimes appear to work, or "mostly work", but you have no promises from anyone that anything will work. If you do this and rely on it working, you're doing coding by trial and error on particular versions of particular platforms you think you understand, and it may break at any time.

On the practical level, setting the FPU control word is not doing what you think it is. Folks set it the way PHP is setting it in hopes of getting IEEE double behavior on the x87 fpu. Unfortunately, this is not possible. The "double" mode the x87 has is not actually IEEE double, but carries extra exponent range and extra bits of precision on denormals that should not be there, that will get lost in a hidden double-rounding step whenever an intermediate is spilled to memory rather than being kept in a register. This kind of inconsistent value breaks the assumptions compiler optimizations are built on and can produce all the standard UB effects. But even if nothing does blow up, it's not giving you the exact same results you'd get on a machine with actual IEEE doubles, which is presumably what whoever did this wanted.

if the OS supports that feature.

The OS does not support that feature when calling library functions that have not given you a contract that they'll accept it. It supports letting you set it for your own code, but the compiler doesn't actually support this (see above), so you will get UB-like results sometimes.

However, I personally think that in-line ASM may be controversial.

As far as I'm concerned, being inline assembly is completely non-controversial. Assembly is really the only way you can make use of this feature without bad interactions with the compiler.

Honestly, I don't think it's realistic because such places exist everywhere in PHP.

I don't think you are prepared to evaluate that unless you understand what it's used for and where it's wanted.

SakiTakamachi · 2023-10-14T03:55:57Z

@dstogov

Do you know what the FPU control word is currently used for and what will happen if it is abolished?

As far as I know, it is described in the following RFC for round.

https://wiki.php.net/rfc/rounding

dstogov · 2023-10-16T07:17:18Z

Do you know what the FPU control word is currently used for and what will happen if it is abolished?

We use SSE floating point instruction on X86_64 and regular FPU in x86 (32-bit). SSE precision is limited to 64-bit doubles while FPU may use 80-bit long double.
I suppose, we limited precision of FPU to unify the behaviour of tests in both environments.

SakiTakamachi · 2023-10-16T13:18:20Z

@dstogov

Thank you. I studied it and was able to understand it to some extent.

richfelker · 2023-10-16T14:08:46Z

SSE precision is limited to 64-bit doubles while FPU may use 80-bit long double.
I suppose, we limited precision of FPU to unify the behaviour of tests in both environments.

As I tried to explain above, the behavior is not actually unified. The x87 "64-bit precision" mode is not actually 64-bit and doesn't match IEEE double behavior. The in-register form has 53 mantissa bits like double, but 15 exponent bits like 80-bit extended. So to begin with, this is a different type with different results.

When these frankenstein floats are spilled to memory, however, things get even more messy. The excess exponent range is collapsed at store time by another rounding operation, producing an inconsistent value depending on whether intermediates are kept in registers or need to be saved to the stack and reloaded (e.g. across function calls or if there's too much register pressure).

There is really no good way to get safe and standard double precision evaluation on x86 without either requiring SSE2+ or using soft-float or some complex hybrid soft/hard method that patches up each result.

It probably makes sense to try to figure out what properties you're actually trying to rely on. If there's a hosted-application-facing reason you're trying to match IEEE double semantics perfectly, you're not doing that now, and fixing it is really hard and costly. But if you're just trying to make tests come out as "pass" without arch-specific result expectations, the answer is probably just "don't do that". You get the safest behavior by processing expressions as double_t (which is long double on 32-bit x86, double most other archs) and using double just as a "storage type".

SakiTakamachi · 2023-10-16T14:46:23Z

First of all, let me state that these are my personal opinions.

I personally think that FP is mainly used in two places, the output system and the calculation system.

Since the output system is "closed" within php, there is probably no need to consider it in this case, and the use of external libraries in the calculation system may be the subject of this discussion.

If the calculation system uses an external library and is a 32-bit architecture, it may be relatively easy to temporarily restore FPU control.

However, I don't know exactly what the side effects of such a change would be.

Such a change would likely require an RFC, but I have no idea how well it would be received.

Really FPU issues bother us. There is no doubt about that.

SakiTakamachi added Bug Status: Needs Triage labels Sep 5, 2023

Girgias assigned dstogov Sep 5, 2023

SakiTakamachi closed this as completed Sep 28, 2023

damianwadley added Status: Invalid and removed Status: Needs Triage labels Oct 3, 2023

andypost added a commit to skilld-labs/php-src that referenced this issue Oct 5, 2023

x86 zend_init_fpu() alters FPU precision php#5621 php#12125

5936971

andypost mentioned this issue Oct 5, 2023

x86 zend_init_fpu() alters FPU precision #12368

Open

andypost mentioned this issue Apr 3, 2024

[Chore] Officially support musl the same way glibc is supported #13877

Closed

About FPU control word settings in 32-bit environment #12125

About FPU control word settings in 32-bit environment #12125

Comments

SakiTakamachi commented Sep 5, 2023

Description

PHP Version

Operating System

dstogov commented Sep 7, 2023

SakiTakamachi commented Sep 7, 2023 • edited Loading

iluuu1994 commented Sep 13, 2023 • edited Loading

SakiTakamachi commented Sep 13, 2023 • edited Loading

iluuu1994 commented Sep 13, 2023 • edited Loading

SakiTakamachi commented Sep 13, 2023

iluuu1994 commented Sep 13, 2023

SakiTakamachi commented Sep 13, 2023

SakiTakamachi commented Sep 15, 2023

iluuu1994 commented Sep 15, 2023

andypost commented Sep 15, 2023

SakiTakamachi commented Sep 15, 2023

andypost commented Sep 15, 2023

SakiTakamachi commented Sep 15, 2023

SakiTakamachi commented Sep 15, 2023

SakiTakamachi commented Sep 16, 2023 • edited Loading

andypost commented Sep 22, 2023

andypost commented Sep 22, 2023

SakiTakamachi commented Sep 22, 2023

andypost commented Sep 22, 2023

SakiTakamachi commented Sep 22, 2023 • edited Loading

andypost commented Sep 22, 2023

SakiTakamachi commented Sep 22, 2023

andypost commented Sep 22, 2023

richfelker commented Sep 22, 2023

andypost commented Sep 22, 2023

andypost commented Sep 22, 2023 • edited Loading

richfelker commented Sep 23, 2023

SakiTakamachi commented Sep 28, 2023

andypost commented Sep 28, 2023

andypost commented Sep 28, 2023

SakiTakamachi commented Sep 28, 2023

SakiTakamachi commented Sep 28, 2023

andypost commented Oct 5, 2023

SakiTakamachi commented Oct 13, 2023

iluuu1994 commented Oct 13, 2023

richfelker commented Oct 13, 2023

SakiTakamachi commented Oct 14, 2023

richfelker commented Oct 14, 2023

SakiTakamachi commented Oct 14, 2023

dstogov commented Oct 16, 2023

SakiTakamachi commented Oct 16, 2023

richfelker commented Oct 16, 2023

SakiTakamachi commented Oct 16, 2023

SakiTakamachi commented Sep 7, 2023 •

edited

Loading

iluuu1994 commented Sep 13, 2023 •

edited

Loading

SakiTakamachi commented Sep 13, 2023 •

edited

Loading

iluuu1994 commented Sep 13, 2023 •

edited

Loading

SakiTakamachi commented Sep 16, 2023 •

edited

Loading

SakiTakamachi commented Sep 22, 2023 •

edited

Loading

andypost commented Sep 22, 2023 •

edited

Loading