Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Happened not relocation of object(s) when removing a node and executing rebalance then add same node #108

Closed
yosukehara opened this issue Nov 28, 2013 · 4 comments

Comments

@yosukehara
Copy link
Member

Operation flow:

(Running Cluster)
- detach [email protected]
- rebalance
- attach [email protected]
- rebalance
@ghost ghost assigned yosukehara Nov 28, 2013
@mocchira
Copy link
Member

I checked if this issue was solved on the latest develop branch, but still remained.
but the result has changed.
Now rebalance after attach [email protected] failed with following error.
[ERROR] "Fail rebalance"

The output of status on manager console is

status
[System config]
                system version : 0.16.5
                total replicas : 3
           # of successes of R : 1
           # of successes of W : 2
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
              ring hash (cur)  : bb7a8bfc
              ring hash (prev) : bb7a8bfc

[Node(s) state]
------------------------------------------------------------------------------------------------------
 type node                         state       ring (cur)    ring (prev)   when                        
------------------------------------------------------------------------------------------------------
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:30 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:30 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:33 +0900
 S    [email protected]       attached    d74cef7f      df6222db      2013-11-29 17:35:28 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:33 +0900
 G    [email protected]     running     bb7a8bfc      bb7a8bfc      2013-11-29 17:07:52 +0900

After a while, output changed a bit.

status
[System config]
                system version : 0.16.5
                total replicas : 3
           # of successes of R : 1
           # of successes of W : 2
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
              ring hash (cur)  : bb7a8bfc
              ring hash (prev) : bb7a8bfc

[Node(s) state]
------------------------------------------------------------------------------------------------------
 type node                         state       ring (cur)    ring (prev)   when                        
------------------------------------------------------------------------------------------------------
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:30 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:30 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:33 +0900
 S    [email protected]       attached    bb7a8bfc      bb7a8bfc      2013-11-29 17:35:40 +0900
 S    [email protected]       running     bb7a8bfc      bb7a8bfc      2013-11-29 17:35:33 +0900
 G    [email protected]     running     bb7a8bfc      bb7a8bfc      2013-11-29 17:07:52 +0900

The attached node's ring seems to be synced with others.
Below is on remote_console.

([email protected])14> leo_redundant_manager_api:get_members().
{ok,[{member,'[email protected]',"node_758266fb",
             "192.168.200.26",13075,ipv4,1385712456577253,running,168,[],
             []},
     {member,'[email protected]',"node_0ad1c0e3",
             "192.168.200.25",13075,ipv4,[],running,168,[],[]},
     {member,'[email protected]',"node_6dffdb07",
             "192.168.200.23",13075,ipv4,1385712447078001,running,168,[],
             []},
     {member,'[email protected]',"node_59da3d3c",
             "192.168.200.22",13075,ipv4,1385712446638246,running,168,[],
             []},
     {member,'[email protected]',"node_e7ac58fd",
             "192.168.200.21",13075,ipv4,1385712446172865,running,168,[],
             []}]}

@yosukehara
Copy link
Member Author

Fixed codes is not complete when I modified them in this morning.
So I'll check and fix that.

@yosukehara
Copy link
Member Author

I've fixed this issue - "leofs/issues/108#issuecomment-29503463".

Then I'll check throughly state of objects after rebalance with "leofs_test"

[State of the system]

status
[System config]
                system version : 0.16.5
                total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
              ring hash (cur)  : 9aeecffa
              ring hash (prev) : 73d9f4b2

[Node(s) state]
-------------------------------------------------------------------------------------------------
 type node                    state       ring (cur)    ring (prev)   when
-------------------------------------------------------------------------------------------------
 S    [email protected]     running     9aeecffa      73d9f4b2      2013-11-29 09:32:34 +0000
 S    [email protected]     running     9aeecffa      73d9f4b2      2013-11-29 09:33:04 +0000
 S    [email protected]     running     9aeecffa      73d9f4b2      2013-11-29 09:33:02 +0000

Thanks for your report.

@yosukehara
Copy link
Member Author

Fixed this issues with leo_redundant_manager v1.2.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants