Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: support reloading node attributes with SIGHUP #3005

Merged

Conversation

End-rey
Copy link
Contributor

@End-rey End-rey commented Nov 8, 2024

Closes #1870.

Also fix bug from #2998, that incorrectly checked when to reconnect.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 17.72152% with 65 lines in your changes missing coverage. Please review.

Project coverage is 22.85%. Comparing base (f1b6982) to head (d430daa).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
cmd/neofs-node/netmap.go 0.00% 39 Missing ⚠️
cmd/neofs-node/config.go 0.00% 13 Missing ⚠️
cmd/neofs-node/attributes.go 70.00% 6 Missing ⚠️
cmd/neofs-node/container.go 0.00% 3 Missing ⚠️
pkg/core/object/fmt.go 0.00% 2 Missing and 1 partial ⚠️
pkg/morph/client/reload.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3005      +/-   ##
==========================================
- Coverage   22.85%   22.85%   -0.01%     
==========================================
  Files         791      791              
  Lines       58603    58684      +81     
==========================================
+ Hits        13395    13411      +16     
- Misses      44312    44376      +64     
- Partials      896      897       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You can compare old/new and avoid updates if nothing changed.
  2. You're likely to have concurrency problems here when updating c.
  3. Not sure what updateLocalState gives us, we want to force node netmap update mostly (send an appropriate tx) and it works the other way around.

@End-rey End-rey force-pushed the 1870-support-reloading-node-attributes-with-sighup branch from b323897 to 59a00ed Compare November 11, 2024 17:05
@End-rey End-rey requested a review from roman-khimov November 13, 2024 08:43
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an atomic pointer would be easier to handle here, but locks can be used too.

cmd/neofs-node/config.go Outdated Show resolved Hide resolved
cmd/neofs-node/attributes.go Outdated Show resolved Hide resolved
cmd/neofs-node/netmap.go Show resolved Hide resolved
cmd/neofs-node/netmap.go Show resolved Hide resolved
@End-rey End-rey force-pushed the 1870-support-reloading-node-attributes-with-sighup branch from 59a00ed to 7a367b0 Compare November 13, 2024 15:20
cmd/neofs-node/attributes.go Outdated Show resolved Hide resolved
cmd/neofs-node/attributes.go Outdated Show resolved Hide resolved
cmd/neofs-node/attributes.go Outdated Show resolved Hide resolved
cmd/neofs-node/config.go Outdated Show resolved Hide resolved
cmd/neofs-node/netmap.go Outdated Show resolved Hide resolved
cmd/neofs-node/container.go Outdated Show resolved Hide resolved
Incorrect expression to check for reconnection. Before that, it worked the other
way around, now is fixed.

Signed-off-by: Andrey Butusov <[email protected]>
@End-rey End-rey force-pushed the 1870-support-reloading-node-attributes-with-sighup branch 2 times, most recently from 52de9be to 47149c2 Compare November 18, 2024 15:26
@End-rey
Copy link
Contributor Author

End-rey commented Nov 18, 2024

@cthulhu-rider Can you help me please.
I have problems with the tests after I updated the sdk:
1.

--- FAIL: TestFormatValidator_Validate (0.01s)
    --- FAIL: TestFormatValidator_Validate/incorrect_session_token (0.00s)
        --- FAIL: TestFormatValidator_Validate/incorrect_session_token/wrong_signature (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x8c2558]

goroutine 25 [running]:
testing.tRunner.func1.2({0xa05da0, 0x10a8350})
        /snap/go/current/src/testing/testing.go:1632 +0x230
testing.tRunner.func1()
        /snap/go/current/src/testing/testing.go:1635 +0x35e
panic({0xa05da0?, 0x10a8350?})
        /snap/go/current/src/runtime/panic.go:785 +0x132
github.com/nspcc-dev/neofs-sdk-go/crypto.PublicKeyBytes({0x0, 0x0})
        /home/endrey/go/pkg/mod/github.com/nspcc-dev/[email protected]/crypto/util.go:16 +0x18
github.com/nspcc-dev/neofs-sdk-go/session.commonData.AssertAuthKey({0x1, {0x81, 0x78, 0x56, 0x98, 0x1e, 0x55, 0x45, 0x7e, 0xab, ...}, ...}, ...)
        /home/endrey/go/pkg/mod/github.com/nspcc-dev/[email protected]/session/common.go:339 +0x25
github.com/nspcc-dev/neofs-node/pkg/core/object.(*FormatValidator).validateSignatureKey(0xc000218498?, 0xc000200a80)
        /home/endrey/neo/neofs-node/pkg/core/object/fmt.go:198 +0x1b8
github.com/nspcc-dev/neofs-node/pkg/core/object.(*FormatValidator).Validate(0xc000266010, 0xc000200a80, 0x0)
        /home/endrey/neo/neofs-node/pkg/core/object/fmt.go:164 +0x6e8
github.com/nspcc-dev/neofs-node/pkg/core/object.TestFormatValidator_Validate.func6.1(0xc00025cea0)
        /home/endrey/neo/neofs-node/pkg/core/object/fmt_test.go:129 +0x18c

This one maybe after this commit, don't return error that public key is missing.
Make check for nil in this func?

func (v *FormatValidator) validateSignatureKey(obj *object.Object) error {

--- FAIL: TestBlobStor_Put_Overflow (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa05e1f]

goroutine 12 [running]:
testing.tRunner.func1.2({0xaadd60, 0x11e6110})
        /snap/go/current/src/testing/testing.go:1632 +0x230
testing.tRunner.func1()
        /snap/go/current/src/testing/testing.go:1635 +0x35e
panic({0xaadd60?, 0x11e6110?})
        /snap/go/current/src/runtime/panic.go:785 +0x132
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor.(*BlobStor).Put(0xc00011a310, {{{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...})
        /home/endrey/neo/neofs-node/pkg/local_object_storage/blobstor/put.go:34 +0x15f
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor_test.TestBlobStor_Put_Overflow(0xc0002f6340)
        /home/endrey/neo/neofs-node/pkg/local_object_storage/blobstor/put_test.go:48 +0x19f

May be after this commit.
Check for nil here?

prm.RawData = prm.Object.Marshal()

--- FAIL: TestShardReload (0.27s)
    logger.go:146: 2024-11-18T12:57:28.906+0300 DEBUG   opening...      {"component": "BlobStor"}
    logger.go:146: 2024-11-18T12:57:28.916+0300 DEBUG   initializing... {"component": "BlobStor"}
    logger.go:146: 2024-11-18T12:57:28.942+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "3q1D7ykZnBWGnAQtt5T3RRiSpAqWAGFiLPdmXJCWRG6L/CZQuQVBCTmjtDuB6n42Pm7RKB4ebzdwTFRgoEbpddTmz", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:28.973+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "7QRMDrx1kFBaq7R2pwu1b1LncrF2XbktRBZipvWRbPkq/3V9Kmqk36yuRmtYAAxTbwh4LbQsStwL2zhGs7QmSq1zc", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:29.007+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "2ZL5irCwvkdksinQHbjrD4A7NvhSbr36R5qDbtLZXmXv/9f8rcbsQYCUHckcHcj3WprsQ4eJdyph15xPV75tDrvWR", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:29.037+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "29k7ZTwNGAU6aUMnfGivecwRqFDhEx2KhaXGST5ycQe5/5UUzS5Hk3omMmVHCudnwn2FZ5ZeuAASg3vSFa1VNSVE", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:29.073+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "6EDxrhcCuXDMDo59zW6hVwK1MaNea8CacM6rmhuBXuuX/GUUEtSt4VGun2TXwb5kP6vm6zF27g2V2Qc3vVbPfsiC4", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:29.094+0300 INFO    trying to restore read-write mode
    logger.go:146: 2024-11-18T12:57:29.094+0300 INFO    setting shard mode      {"old_mode": "READ_WRITE", "new_mode": "READ_WRITE"}
    logger.go:146: 2024-11-18T12:57:29.094+0300 INFO    shard mode set successfully     {"mode": "READ_WRITE"}
    logger.go:146: 2024-11-18T12:57:29.112+0300 INFO    trying to restore read-write mode
    logger.go:146: 2024-11-18T12:57:29.113+0300 INFO    setting shard mode      {"old_mode": "READ_WRITE", "new_mode": "READ_WRITE"}
    logger.go:146: 2024-11-18T12:57:29.113+0300 INFO    shard mode set successfully     {"mode": "READ_WRITE"}
    logger.go:146: 2024-11-18T12:57:29.132+0300 INFO    local object storage operation  {"component": "BlobStor", "address": "5Hu41RGaong7w6x8FMmdR9roYcdrbR7Wy1m3aJ2pt34J/13tNqp8evbT3ZZUJ23pgp2H8avWZYEThvNW1k1xn9sPS", "op": "PUT", "type": "fstree", "storage_id": ""}
    logger.go:146: 2024-11-18T12:57:29.168+0300 WARN    could not unmarshal object      {"address": "5Hu41RGaong7w6x8FMmdR9roYcdrbR7Wy1m3aJ2pt34J/13tNqp8evbT3ZZUJ23pgp2H8avWZYEThvNW1k1xn9sPS", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.169+0300 WARN    could not unmarshal object      {"address": "7QRMDrx1kFBaq7R2pwu1b1LncrF2XbktRBZipvWRbPkq/3V9Kmqk36yuRmtYAAxTbwh4LbQsStwL2zhGs7QmSq1zc", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.169+0300 WARN    could not unmarshal object      {"address": "29k7ZTwNGAU6aUMnfGivecwRqFDhEx2KhaXGST5ycQe5/5UUzS5Hk3omMmVHCudnwn2FZ5ZeuAASg3vSFa1VNSVE", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.169+0300 WARN    could not unmarshal object      {"address": "2ZL5irCwvkdksinQHbjrD4A7NvhSbr36R5qDbtLZXmXv/9f8rcbsQYCUHckcHcj3WprsQ4eJdyph15xPV75tDrvWR", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.169+0300 WARN    could not unmarshal object      {"address": "3q1D7ykZnBWGnAQtt5T3RRiSpAqWAGFiLPdmXJCWRG6L/CZQuQVBCTmjtDuB6n42Pm7RKB4ebzdwTFRgoEbpddTmz", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.170+0300 WARN    could not unmarshal object      {"address": "6EDxrhcCuXDMDo59zW6hVwK1MaNea8CacM6rmhuBXuuX/GUUEtSt4VGun2TXwb5kP6vm6zF27g2V2Qc3vVbPfsiC4", "err": "invalid header: invalid session token: missing session issuer"}
    logger.go:146: 2024-11-18T12:57:29.173+0300 INFO    trying to restore read-write mode
    logger.go:146: 2024-11-18T12:57:29.173+0300 INFO    setting shard mode      {"old_mode": "READ_WRITE", "new_mode": "READ_WRITE"}
    logger.go:146: 2024-11-18T12:57:29.173+0300 INFO    shard mode set successfully     {"mode": "READ_WRITE"}
    --- FAIL: TestShardReload/open_meta_at_new_path (0.08s)
        reload_test.go:69: 
                Error Trace:    /home/endrey/neo/neofs-node/pkg/local_object_storage/shard/reload_test.go:69
                                                        /home/endrey/neo/neofs-node/pkg/local_object_storage/shard/reload_test.go:100
                Error:          Not equal: 
                                expected: true
                                actual  : false
                Test:           TestShardReload/open_meta_at_new_path
                Messages:       object #0 is missing
FAIL
FAIL    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard  29.652s

I think here there are some differences that break the test.
Don't know why session token owner ID is nil.

--- FAIL: TestHeadRequest (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb3d9db]

goroutine 9 [running]:
testing.tRunner.func1.2({0xbfc840, 0x1492760})
        /snap/go/current/src/testing/testing.go:1632 +0x230
testing.tRunner.func1()
        /snap/go/current/src/testing/testing.go:1635 +0x35e
panic({0xbfc840?, 0x1492760?})
        /snap/go/current/src/runtime/panic.go:785 +0x132
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.headersFromObject(0xc000328660, {0x88, 0x3c, 0xd5, 0xb2, 0xe1, 0x24, 0xa1, 0xa5, 0x49, ...}, ...)
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/object.go:52 +0x3fb
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.(*cfg).localObjectHeaders(0xc2583cdf8424537b?, {0x88, 0x3c, 0xd5, 0xb2, 0xe1, 0x24, 0xa1, 0xa5, 0x49, ...}, ...)
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/headers.go:273 +0x12e
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.(*cfg).readObjectHeaders(0xc0000ba8a0, 0xc0001af9e0)
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/headers.go:124 +0x6aa
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.NewMessageHeaderSource({0xc0001afb40, 0x4, 0xc0001afa98?})
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/headers.go:74 +0xc5
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.TestHeadRequest.func1(0xc000189380)
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/eacl_test.go:115 +0x16e
github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2.TestHeadRequest(0xc000189380)
        /home/endrey/neo/neofs-node/pkg/services/object/acl/eacl/v2/eacl_test.go:134 +0xd78
testing.tRunner(0xc000189380, 0xdf8488)
        /snap/go/current/src/testing/testing.go:1690 +0xf4
created by testing.(*T).Run in goroutine 1
        /snap/go/current/src/testing/testing.go:1743 +0x390
FAIL    github.com/nspcc-dev/neofs-node/pkg/services/object/acl/eacl/v2 0.013s

This commit make Version nil.
Maybe fix this by checking the version before.

cmd/neofs-cli/modules/util/acl.go Outdated Show resolved Hide resolved
pkg/services/object/tombstone/verify_test.go Show resolved Hide resolved
pkg/services/object/tombstone/verify_test.go Show resolved Hide resolved
cmd/neofs-node/attributes.go Outdated Show resolved Hide resolved
cmd/neofs-node/netmap.go Outdated Show resolved Hide resolved
@cthulhu-rider
Copy link
Contributor

@End-rey bout the tests:

  1. public key absence must be pre-checked by the node. But no reason to panic at the lib, i'll add the check there
  2. definitely, prms must be correctly filled. Lets both a) add the check and return error from engine b) fix input in the test
  3. yep, latest decoder complains about missing mandatory token fields, although it shouldn't. This is a lib flaw, i'll fix it
  4. nil version must be treated specifically now. As i can see u've already fixed this right?

@End-rey
Copy link
Contributor Author

End-rey commented Nov 19, 2024

@cthulhu-rider I fixed everything that needed to be fixed. And what are we going to do with the 3rd point? Wait for the problem to close in the SDK?

@cthulhu-rider
Copy link
Contributor

cthulhu-rider commented Nov 20, 2024

Wait for the problem to close in the SDK?

yeah but in the background, there is nothing to do here for now

https://github.com/nspcc-dev/neofs-node/actions/runs/11913864433/job/33200509043?pr=3005 bothers me more

@roman-khimov
Copy link
Member

What's up with tests here?

@roman-khimov
Copy link
Member

Well, the node is broken with the new SDK. We either fix something or revert to the old patch here that used api-go (pushing SDK update into a separate PR). I suggest the latter since I have no idea how long will it take to fix SDKish problems.

@End-rey End-rey force-pushed the 1870-support-reloading-node-attributes-with-sighup branch 2 times, most recently from 48aa919 to 2815c4f Compare November 21, 2024 09:45
pkg/local_object_storage/blobstor/put.go Outdated Show resolved Hide resolved
pkg/core/object/fmt.go Show resolved Hide resolved
There was incorrect behavior in the test that was fixed.

Signed-off-by: Andrey Butusov <[email protected]>
Add a new function `cfg.reloadNodeAttributes` that updates the list of node
attributes.
Add RWMutex for `cfg.cfgNodeInfo.localInfo`.
Add docs.

Closes #1870.

Signed-off-by: Andrey Butusov <[email protected]>
@End-rey End-rey force-pushed the 1870-support-reloading-node-attributes-with-sighup branch from 2815c4f to d430daa Compare November 21, 2024 14:08
@roman-khimov roman-khimov merged commit 1092989 into master Nov 21, 2024
20 of 22 checks passed
@roman-khimov roman-khimov deleted the 1870-support-reloading-node-attributes-with-sighup branch November 21, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support reloading node attributes with SIGHUP
4 participants