Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync/atomic: add Uint64Pair #61236

Open
haraldrudell opened this issue Jul 8, 2023 · 36 comments
Open

sync/atomic: add Uint64Pair #61236

haraldrudell opened this issue Jul 8, 2023 · 36 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted Proposal Proposal-Accepted
Milestone

Comments

@haraldrudell
Copy link

I suggest CAS2 is implemented in the atomic package similar to atomic.align64, operational for hardware that supports it.

possibly also:

  • atomic OR
  • LL/SC Load-link/store-conditional

  • With the 2022 memory model, go1.19 atomics became useful after years of unproductive near-sighted discussions
  • However, wait-free algorithms now use CAS2, ie. 128-bit atomic, which is supported on most cpus but not in Go
  • implementing CAS spinners in lock-free applications produces long thread wait-times or 100% cpu, so wait-free is better: threads are neither suspended or spinning. We will have 100 cores and 100 GiB RAM from Apple soon enough
  • what is desired is a performant wait-free queue and wait-free free-list stack
  • fast-path of wCQ Nikolaev Ravindran 2022 and wfq Yang Mellor-Crummey 2016
  • this would enable an initial thread-pool to have wait-free and allocation-free data sink

darwin-arm64 darwin-amd64 linux-amd64

I think also atomic generics could do with another polish
Like this AtomicMax that is more flexible in what underlying integer types and named types can be used: https://github.com/haraldrudell/parl/blob/main/atomic-max.go

  • by inventing align64 only in atomic, 64-bit things must go in atomic
@gopherbot gopherbot added this to the Proposal milestone Jul 8, 2023
@seankhliao seankhliao changed the title proposal: support CAS2 in Go affected/package: atomic proposal: sync/atomic: support CAS2 Jul 8, 2023
@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 10, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Jul 12, 2023
@ianlancetaylor
Copy link
Member

I think the proposal here is

// CompareAndSwapUint128 executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint128(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

But see also #9455. If we adopt that, then this version of the function would no longer make sense. So perhaps the proposal here should be

// CompareAndSwapUint64Pair executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint64Pair(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

Presumably hardware does not have a 128-bit CAS operation could implement this using something like the lock table in runtime/internal/atomic/atomic_arm.go.

@ianlancetaylor ianlancetaylor changed the title proposal: sync/atomic: support CAS2 proposal: sync/atomic: add CompareAndSwapUint128 Jul 12, 2023
@randall77
Copy link
Contributor

Perhaps

func CompareAndSwapUint64Pair(addr *[2]uint64, old, new [2]uint64) (swapped bool)

On x86 at least, the hardware instructionCMPXCHG16B requires its argument to be 16-byte aligned. I'm not sure how we would enforce that. It sounds like #599 all over again.

@merykitty
Copy link

merykitty commented Jul 12, 2023

We could have atomic.Uint64Pair which can be enforced specially by the compiler to have 16-byte alignment. An API point to operate on arbitrary pairs of uint64 seems improbable.

@haraldrudell
Copy link
Author

  1. CAS2 atomic.CompareAndSwap is one required 128-bit instruction for wait-Free 2016+
  2. The other is fetch-and-add FAA atomic.Add
  3. LL/SC is only used if the other 2 do not exist and is actually more complicated, likely for some unusual platforms that are not arm64 or amd64
  4. If there is none of that, it’s slow-path

There is a Go package that uses array of uint64, [2]uint64 in its api:
github.com/CAFxX/atomic128

Generally, the data used is not a 128-bit number but a composite of often two 64-bit numbers that are changed atomically as a pair

@CAFxX
Copy link
Contributor

CAFxX commented Jul 17, 2023

There is a Go package that uses array of uint64, [2]uint64 in its api: https://github.com/CAFxX/atomic128

FTR this was just out of necessity (due to the lack of uint128) more than anything else. It's true that both intel and arm define their DWCAS instructions as operating on pairs of 64-bit values, but that does not automatically mean we should expose that in our APIs (especially given the alignment requirements that are uncharacteristic of uint64s)

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

Which systems would support this function?

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Aug 9, 2023
@CAFxX
Copy link
Contributor

CAFxX commented Aug 10, 2023

Which systems would support this function?

Natively: I'm aware of at least amd64 with CMPXCHG16B (AMD64v2 and above) and aarch64/ARMv8.1 with LSE

Others will have to be emulated (I just use a mutex in my library in this case, but other implementations are possible e.g. for aarch64 there are DW LL/SC instructions that were available before the native DWCAS)

@rsc
Copy link
Contributor

rsc commented Aug 16, 2023

We don't want to have an API that's unavailable or always fails on 32-bit systems, so what do we do on 32-bit systems?
If we use a lock on 32-bit systems then we need to provide ways for code to read the 64-bit halves or the whole 128-bit value. (Currently there's no way to find out what the value has!)

Perhaps the right path forward is to define a type wrapper like atomic.Uint64 and not the un-wrapped one.

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() [2]uint64
func (x *Uint64Pair) Store(new [2]uint64) 
func (x *Uint64Pair) CompareAndSwap(old, new [2]uint64) bool
func (x *Uint64Pair) Swap(new [2]uint64) (old [2]uint64)

But there would not be corresponding top-level functions, to avoid direct access to the atomic uint64 values.

Edit: Added Swap.

@someview
Copy link

someview commented Aug 26, 2023

How about add something like MarkableReference and TimeStampedReference in java,just extend the val :

func CompareAndSwapUint64Pointer(oldPtr unsafe.Pointer, newPtr unsafe.Pointer, new1, new2 uint64) (swapped bool)

This is useful for lock free data structure:
hashmap.add

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

@someview
Copy link

If cas128 must support

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

This may be double-word CAS,not atomic128 with 32-bit system

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

It sounds like people are generally happy with #61236 (comment).
Have all remaining concerns about this proposal been addressed?

@randall77
Copy link
Contributor

We probably want Swap also, the other types have that.

@rsc
Copy link
Contributor

rsc commented Aug 31, 2023

Added Swap, thanks.

@CAFxX
Copy link
Contributor

CAFxX commented Aug 31, 2023

I suspect quite a few uses of a DWCAS would be to deal with pointers... Should we consider also PointerPair/[2]unsafe.Pointer in this proposal (with the caveat that the size will differ between 32 and 64 bit archs), or should we leave it for later?

@ianlancetaylor
Copy link
Member

I suggest that PointerPair be a different proposal, since, as you say, it's not necessarily a 128-bit issue. People might in principle be running into it today on 32-bit systems. That separate proposal should ideally point out some algorithms that would benefit from a PointerPair implementation. Thanks.

@rsc
Copy link
Contributor

rsc commented Sep 7, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals Sep 7, 2023
@rsc
Copy link
Contributor

rsc commented Oct 3, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Active in Proposals Oct 3, 2023
@rsc
Copy link
Contributor

rsc commented Oct 4, 2023

Right now it looks like the proposal is to add three types with this API:

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() (uint64, uint64)
func (x *Uint64Pair) Store(uint64, uint64) 
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

type PointerPair[T1, T2 any] struct { ... unexported ... }
func (x *PointerPair[T1, T2]) Load() (*T1, *T2)
func (x *PointerPair[T1, T2]) Store(new1 *T1, new2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

type Uint64AndPointer[T1 any] struct { ... unexported ... }
func (x *Uint64AndPointer[T1]) Load() (uint64, *T1)
func (x *Uint64AndPointer[T1]) Store(uint64, *T1)
func (x *Uint64AndPointer[T1]) CompareAndSwap(old1 uint64, old2 *T1, new1 uint64, new2 *T2) bool
func (x *Uint64AndPointer[T1]) Swap(new1 uint64, new2 *T1) (old1 uint64, old2 *T1)

And maybe also UintptrAndPointer.

The arrays are gone because they only apply in Uint64Pair.

It's not clear if this is the road we want to go down, but I don't see another one.

@rsc
Copy link
Contributor

rsc commented Oct 11, 2023

This is the API we think we'd use. Is there anyone who has a use case that isn't addressed by this? And is there anyone who has a use case that is addressed by this? (We want to make sure we're not adding something no one needs.) Thanks!

@zephyrtronium
Copy link
Contributor

PointerPair with CAS makes it straightforward to implement an efficient lock-free linked list with arbitrary elements, which directly addresses a use case I have. Uint64AndPointer does the same for uint64 elements, which is probably still useful. It's a bit of a shame that having them as separate types means that the same type can't provide both varieties of linked list, but I'm not sure how common the latter is anyway.

If @haraldrudell or someone else could check that my reading of wCQ in #61236 (comment) is correct, that would confirm another use case for Uint64AndPointer.

@rsc
Copy link
Contributor

rsc commented Oct 26, 2023

Have all remaining concerns about this proposal been addressed?

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc
Copy link
Contributor

rsc commented Nov 1, 2023

Note that like with atomic.Int64, the compiler and toolchain will provide the necessary alignment automatically, so there's no need to worry about it or mention it here.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

This seems like a likely accept but we may well not get to it until Go 1.23 at this point.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc rsc moved this from Active to Likely Accept in Proposals Nov 2, 2023
@rsc rsc moved this from Likely Accept to Accepted in Proposals Nov 10, 2023
@rsc
Copy link
Contributor

rsc commented Nov 10, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc rsc changed the title proposal: sync/atomic: add Uint64Pair sync/atomic: add Uint64Pair Nov 10, 2023
@rsc rsc modified the milestones: Proposal, Backlog Nov 10, 2023
@Zheng-Xu
Copy link
Contributor

Zheng-Xu commented Nov 13, 2023

In https://pkg.go.dev/sync/atomic#pkg-note-BUG , it is said

On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

So far, we don't have 16-byte aligned data structure in Go. Will we provide the capability to cast something else to *Uint64Pair, for example the address of a struct to *Uint64Pair?

More specifically, can we guarantee below Load() is atomic?

var comp complex128
(*atomic. Uint64Pair)(unsafe.Pointer(&comp)).Load()

@cherrymui
Copy link
Member

@Zheng-Xu as commented in #61236 (comment) , atomic. Uint64Pair will be properly aligned, to 16-byte if the architecture's atomic operations require it. That said, the alignment of complex128 is not changed, so that unsafe conversion is not safe.

@mauri870
Copy link
Member

mauri870 commented Dec 8, 2023

I would be happy to work on this for the 1.23 cycle, given that I have previous experience implementing internal and sync/atomic APIs. I'll keep it assigned unless someone else is interested.

@reusee
Copy link

reusee commented Dec 2, 2024

Just out of curiosity, any progress update on the implementation?

@mauri870
Copy link
Member

mauri870 commented Dec 2, 2024

Just out of curiosity, any progress update on the implementation?

Thanks for following up! Since we've already reached the freeze for Go 1.24, this will likely only make it into the 1.25 release. Unfortunately, I haven't had the time to work on this during the current release cycle, but it's still on my radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests