Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon to support network restore and Macvlan driver exits experimental #23524

Merged
merged 3 commits into from
Jun 15, 2016
Merged

Daemon to support network restore and Macvlan driver exits experimental #23524

merged 3 commits into from
Jun 15, 2016

Conversation

aboch
Copy link
Contributor

@aboch aboch commented Jun 14, 2016

This PR brings:

@thaJeztah
Copy link
Member

ping @crosbymichael PTAL

Signed-off-by: Alessandro Boch <[email protected]>
@icecrime
Copy link
Contributor

Quick question: is it possible that I run a daemon, initialize a Swarm, change my daemon config to add --live-restore, and reload this configuration? That would be a loophole in the validation process. (cc @crosbymichael)

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

@icecrime Need to think about it. But existing validation logic would have same issue.
Being this to cover a corner case, would this be a good candidate for a bug fix. Do not want to rush a decision on the validation now.

@mavenugo
Copy link
Contributor

@aboch no. I added a check if daemon.clusterProvider != nil { in reloadClusterDiscovery to handle this case for --cluster-store configruation.

I think you should do a similar check for --live-restore as well

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

@icecrime @mavenugo Ah, then the same would work with the new check.

I just tested the scenario depicted by @icecrime:

  • start daemon w/o live-restore
  • init swarm
  • stop daemon and restart with --live-restore

daemon start fails with:

FATA[0002] Error creating cluster component: swarm mode is incompatible with --live-restore option

@icecrime
Copy link
Contributor

@aboch The scenario I was mentioning was about hot-reloading the configuration, not restarting the daemon!

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

@icecrime Verified the config reload leads to the same validation:

DEBU[0045] Calling GET /v1.24/containers/json           
DEBU[0049] aboch-W540: Initiating bulk sync with nodes [] 
INFO[0071] Got signal to reload configuration, reloading from: /home/alessandro/config.json 
ERRO[0071] Error reconfiguring the daemon: --cluster-store and --cluster-advertise daemon configurations are incompatible with swarm mode 
DEBU[0079] aboch-W540: Initiating bulk sync with nodes [] 

@mavenugo mavenugo added this to the 1.12.0 milestone Jun 14, 2016
@mavenugo
Copy link
Contributor

@aboch

ERRO[0071] Error reconfiguring the daemon: --cluster-store and --cluster-advertise daemon configurations are incompatible with swarm mode

It should be --live-restore in your case.

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

LOL, I didn't realize the same error message was being printed in different places...
I assumed the same validation function was being executed from different paths

Will take care of it shortly

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

@mavenugo @icecrime Thanks, PTAL, the inner compatibility check is now a property of the config and same function is used for validation in the two paths.

Follows the daemon logs for two hot reloads after the swarm init: In the first one the json config specifies the live restore. In the second one it specifies the cluster store config:

INFO[0053] Got signal to reload configuration, reloading from: /home/alessandro/config.json 
ERRO[0053] Error reconfiguring the daemon: --live-restore daemon configuration is incompatible with swarm mode 

INFO[0068] Got signal to reload configuration, reloading from: /home/alessandro/config.json 
ERRO[0068] Error reconfiguring the daemon: --cluster-store and --cluster-advertise daemon configurations are incompatible with swarm mode 
DEBU[0071] aboch-W540: Initiating bulk sync with nodes [] 

@mlaventure
Copy link
Contributor

@aboch

23:35:56 daemon\daemon.go:380: daemon.configStore.isSwarmCompatible undefined (type *Config has no field or method isSwarmCompatible)
23:35:56 daemon\daemon.go:935: config.isSwarmCompatible undefined (type *Config has no field or method isSwarmCompatible)

Windows failure is genuine.

@aboch
Copy link
Contributor Author

aboch commented Jun 14, 2016

Thanks @mlaventure. Updated.

@icecrime
Copy link
Contributor

LGTM 👍

@icecrime
Copy link
Contributor

One failure on janky (related?):

17:12:10 FAIL: docker_cli_links_test.go:32: DockerSuite.TestLinksPingLinkedContainers
17:12:10 
17:12:10 docker_cli_links_test.go:46:
17:12:10     // 3. Ping by hostname
17:12:10     dockerCmd(c, append(runArgs, fmt.Sprintf(pingCmd, "fred", "wilma"))...)
17:12:10 /go/src/github.com/docker/docker/pkg/integration/dockerCmd_utils.go:42:
17:12:10     c.Assert(err, check.IsNil, check.Commentf(quote+"%v"+quote+" failed with errors: %s, %v", strings.Join(args, " "), out, err))
17:12:10 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc8231a3320), Stderr:[]uint8(nil)} ("exit status 137")
17:12:10 ... "run --rm --link container1:alias1 --link container2:alias2 busybox sh -c ping -c 1 fred -W 1 && ping -c 1 wilma -W 1" failed with errors: , exit status 137

@aboch
Copy link
Contributor Author

aboch commented Jun 15, 2016

@icecrime
Not sure, that test pass on my machine:

---> Making bundle: .ensure-nnp-test (in bundles/1.12.0-dev/test-integration-cli)
+ go test -check.f DockerSuite.TestLinksPingLinkedContainers -check.v -check.timeout=5m -test.timeout=360m github.com/docker/docker/integration-cli
INFO: Testing against a local daemon
PASS: docker_cli_links_test.go:32: DockerSuite.TestLinksPingLinkedContainers    1.735s
PASS: docker_cli_links_test.go:50: DockerSuite.TestLinksPingLinkedContainersAfterRename 1.393s
PASS: docker_cli_exec_test.go:365: DockerSuite.TestLinksPingLinkedContainersOnRename    0.640s
OK: 3 passed
PASS
coverage: 10.4% of statements
---> Making bundle: .integration-daemon-stop (in bundles/1.12.0-dev/test-integration-cli)

Let me look at it closer

@icecrime
Copy link
Contributor

Hopefully it's unrelated. @crosbymichael will be testing the PR tonight.

@crosbymichael
Copy link
Contributor

LGTM

@tonistiigi
Copy link
Member

Ping test failure in ci, not sure if related.

I would prefer IsSwarmCompatible configs checks to be handled in a layer above cluster and daemon instead of using executor callback that is not designed for these kinds of things. But this would probably require a larger refactor. LGTM

@icecrime
Copy link
Contributor

Weird failure in docker-py:

TestNetworks.test_create_network_with_host_driver_fails ____________
02:21:40 /docker-py/tests/integration/network_test.py:91: in test_create_network_with_host_driver_fails
02:21:40     self.client.create_network(net_name, driver='host')
02:21:40 E   Failed: DID NOT RAISE

@icecrime
Copy link
Contributor

The failure in docker-py is legitimate, but @aboch is aware and will fix in a followup PR. Amazing that docker-py suite caught it :-)

@icecrime icecrime merged commit 9119795 into moby:master Jun 15, 2016
@aboch
Copy link
Contributor Author

aboch commented Jun 15, 2016

Yes, it is a genuine failure, good docker-py test.
Not a big deal, docker will allow user to create a host network when the one provisioned by the daemon is already present. Second creation will fail.
Will open an issue, should be a simple fix.

@coolljt0725
Copy link
Contributor

👏

@mavenugo mavenugo changed the title Daemon to support network restore Daemon to support network restore and Macvlan driver exits experimental Jun 17, 2016
@dreamcat4
Copy link

Thanks guys - really appreciate this.

@@ -120,3 +121,13 @@ func (config *Config) GetAllRuntimes() map[string]types.Runtime {
config.reloadLock.Unlock()
return rts
}

func (config *Config) isSwarmCompatible() error {
if config.IsValueSet("cluster-store") || config.IsValueSet("cluster-advertise") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this code from config.ClusterStore != "" to config.IsValueSet("cluster-store")?
As I know, config.IsValueSet() checks only configs which were set in the configuration file.
So this code always returns false even if using --cluster-store option.

@aboch
Copy link
Contributor Author

aboch commented Jun 28, 2016

@ncoolz Thank you for spotting this. In my initial diff I was looking for the config parameter value and tested with that, later I moved to the IsValueSet(), that is a mistake.

Please let me know if you can push a patch to fix this, otherwise I will take care of it.

Thanks again.

@ncoolz
Copy link
Contributor

ncoolz commented Jun 28, 2016

@aboch I will send a PR immediately. Thank you.

@thaJeztah
Copy link
Member

Thanks @ncoolz!

@eungjun-yi
Copy link
Contributor

./experimental/vlan-networks.md still say that macvlan driver is in experimental mode. I think this should be updated.

The Macvlan and Ipvlan drivers are currently in experimental mode in order to incubate Docker users use cases and vet the implementation to ensure a hardened, production ready driver in a future release.

-- https://github.com/docker/docker/blob/master/experimental/vlan-networks.md#user-content-getting-started

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.