Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"distinct uint64" type corruption on 32-bit, when using {.borrow.} operators #13902

Closed
stefantalpalaru opened this issue Apr 6, 2020 · 12 comments · Fixed by #13907
Closed

"distinct uint64" type corruption on 32-bit, when using {.borrow.} operators #13902

stefantalpalaru opened this issue Apr 6, 2020 · 12 comments · Fixed by #13907

Comments

@stefantalpalaru
Copy link
Contributor

The problem only appears when borrowed operators for a "distinct uint64" type force a type inference.

Example

type Slot = distinct uint64

var s = Slot(1)

proc `$`*(x: Slot): string {.borrow.}
proc `+=`*(x: var Slot, y: uint64) {.borrow.}

echo "s = ", s
s += 1
echo "s = ", s

Compiling and running a 32-bit binary on a 64-bit Linux system: nim c -r --cpu:i386 --passC:"-m32" --passL:-m32 uint64.nim

Current Output

s = 1
s = 6223254620873949186

Expected Output

s = 1
s = 2

Additional Information

Found when porting nim-beacon-chain to Nim-1.2.0: status-im/nimbus-eth2#829

$ nim -v
Nim Compiler Version 1.2.0 [Linux: amd64]
Compiled at 2020-04-03
Copyright (c) 2006-2020 by Andreas Rumpf

active boot switches: -d:release
@stefantalpalaru
Copy link
Contributor Author

It seems to only apply to borrowed operators with var params. This one works:

proc `+`*(x: Slot, y: uint64): Slot {.borrow.}

@mratsim
Copy link
Collaborator

mratsim commented Apr 7, 2020

This used to work in 1.0.6 and appeared in 1.2.0

@stefantalpalaru
Copy link
Contributor Author

Regression introduced by this commit: 19cab9f

Paging @timotheecour.

What looks suspicious in there is that a new "frameMsgLen" field is added unconditionally to struct TFrame_ in "nimbase.h" but then it's added conditionally to TFrame in "system.nim".

Adding --stacktraceMsgs:on to nim's command line does not fix the problem, but commenting out that extra field in "nimbase.h" does.

I thought adding the missing struct field to this anonymous struct copy of TFrame_ would help, but it doesn't:

struct {TFrame* prev;NCSTRING procname;NI line;NCSTRING filename; NI len; VarSlot s[slots];} FR_; \

I'm out of ideas for what could cause a wrong length assumption for that struct.

@alaviss
Copy link
Collaborator

alaviss commented Apr 7, 2020

I don't think that's the problem. Your problem lies in the C code:

	nimln_(9, "/tmp/test.nim");	if (nimAddInt(s__1PREurDxZ5w0Xbv7yMs9aWw, 1ULL, &TM__ipcYmBC9bj9a1BW35ABoB1Kw_3)) { raiseOverflow(); };	s__1PREurDxZ5w0Xbv7yMs9aWw = (NU64)(TM__ipcYmBC9bj9a1BW35ABoB1Kw_3);

This code is used to add integers, and on 32bit systems, they'll be treated as 32bit ints. That's where the corruption came from.

This regression was introduced and/or surfaced by @Araq's #13626.

@stefantalpalaru
Copy link
Contributor Author

I used git bisect to find the first commit with the regression.

@timotheecour
Copy link
Member

i will look into it in a bit, today

@alaviss
Copy link
Collaborator

alaviss commented Apr 7, 2020

Guess I catched another regression: distinct uints still have their bounds checked.

@timotheecour
Copy link
Member

timotheecour commented Apr 7, 2020

I used git bisect to find the first commit with the regression.

I believe you've used git bisect incorrectly :-)
I ran git bisect and it found instead exactly the commit I was suspecting: a6682de catchable defects (#13626)
(which, like all great PRs, can sometimes introduce regressions)

/cc @Araq

(and also verified manually right before and right after this commit)

btw, maybe it's time to look more closely into the other comments that have been marked as resolved but IMO are still valid on that PR (even if some of these refer to pre-existing issues)

These concern bound checking codegen issues and potential double evaluation bugs/performance bugs, eg #13626 (comment) amongst others

@stefantalpalaru
Copy link
Contributor Author

stefantalpalaru commented Apr 7, 2020

I believe you've used git bisect incorrectly :-)

Doubt it. I started from 6e0c06f mentioned by leorize as working, on IRC, then manually verified the result by checking out 8088633 which works and 19cab9f which fails.

It may be possible that the problem appeared and disappeared as the stack changed in unrelated places, but keep in mind that it's solved by removing "frameMsgLen" from struct TFrame_.

@timotheecour
Copy link
Member

It may be possible that the problem appeared and disappeared as the stack changed in unrelated places, but keep in mind that it's solved by removing "frameMsgLen" from struct TFrame_.

--stacktrace:off would exhibit the bug with 1.2.0 so I can't see how #13351 could relate. The root cause is #13626 which has buggy semantics for things like mInc (+=) for distinct unsigned, which could trigger in #13351 but not because of it. Note that it also triggers or not depending on random unrelated code modifications happening between #13626 <= here < #13351 so again, it should be unrelated to that.

I found the bug and am fixing it in #13907 (just a 1 line change, plus a test).

note that it also fixes a (related) cpp codegen compilation error

@Araq Araq added Showstopper and removed Severe labels Apr 7, 2020
@Araq
Copy link
Member

Araq commented Apr 7, 2020

I have said it elsewhere, "git bisect" can have false positives and other ways for bug hunting are much more effective. Like, I dunno, trying to understand what's actually going on.

@timotheecour
Copy link
Member

timotheecour commented Apr 7, 2020

I have said it elsewhere, "git bisect" can have false positives

It's about reducing the search space.
In large majority of cases there is a single good=>bad transition even if you have intermediate broken commits (eg because a PR was not squashed). When there are multiple (accidental) good=>bad transition, either the git bisect run test can be made more strict, or you can still automate the task of finding the first transition, even if, strictly speaking, that's O(N) worst case complexity in this case (but that would be rare).

it's trivial to write a nim wrapper on top of git bisect run that does exactly that. The point is, it's automated, and for all but the simplest bugs it helps.

Araq pushed a commit that referenced this issue Apr 8, 2020
) [backport:1.2]

* fix #13902 distinct uint64 type corruption on 32-bit with borrow

Co-authored-by: Timothee Cour <[email protected]>
narimiran pushed a commit that referenced this issue Apr 14, 2020
) [backport:1.2]

* fix #13902 distinct uint64 type corruption on 32-bit with borrow

Co-authored-by: Timothee Cour <[email protected]>
(cherry picked from commit 95fd8ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants