-
Notifications
You must be signed in to change notification settings - Fork 560
Add additional v3 vm sizes permitted for dcos master and agent #2184
Conversation
@yakman2020 These const values are auto-generated, so we'll have to update the script at
Do we want to to decrease the value of this constant here?: https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/Get-AzureConstants.py#L23 (That will inevitably bring in a bunch more master SKU sizes into the whitelist, I assume that's O.K.) |
Yup. We really only need 30-40Gb for basic DCOS, though the users may disagree. I would try 80GB initially |
Standard_D2s_v3 looks to only expose 16GB, so changing the minimum from ~100GB to ~80GB will still exclude Standard_D2s_v3. |
So, I don't see how that could be correct. I'm deploying on D2s_v3 and they have 32GB main memory and a 128 gb system disk. Something is wrong with that data, albeit I'm specifying the size of the OS disk. |
not correct. OK. My D3s instance is 40GB system disk. That is fine. Presumably I can specify larger in the deployment. |
08d3f98
to
47eb206
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* update ubuntu image for German cloud (Azure#2036) * Fix issue with apiserver when using AADProfile (Azure#2047) (Azure#2055) * Fix issue with apiserver when using AADProfile (Azure#2047) * fixing failed test * missed another test * clear containers (Azure#1945) * clear-containers: add runtime to api and pass through parameters Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: add scripts Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: add example Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: fix variables Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: add docs Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: update install script Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: fix script Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: update example Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: update features docs Signed-off-by: Jess Frazelle <[email protected]> * clear-containers: make test linters happy Signed-off-by: Jess Frazelle <[email protected]> * setKubeletOpts to work better with kubeconfig Signed-off-by: Jess Frazelle <[email protected]> * whitespace cruft * more whitespace fun * Add --feature-gates handling for kubelet and api server (Azure#2032) * Add --feature-gates handling For kubeletConfig, preserve the existing behaviour of adding Accelerators=true for agent config (for kubernetes 1.6.0 and later) * simplified default implementation and removed KUBELET_FEATURE_GATES * unnecessary default assignment, and simple validation * removed copyMap func * enforce min version, "Accelerators=true" is only for agents * Pass current version in to addDefaultFeatureGates Generalise addDefaultFeatureGates by passing in current orchestrator version, and deferring the minimum required version to callers * Add tests for --feature-gates behaviour Test for only adding Accelerators=true for 1.6.0+ Test for correct application of KubeletConfig from top-level vs Master/AgentProfile * Avoid ref sharing for kubeletConfig Fix failing test by avoiding reference sharing of kubeletConfig properites on master and agent profiles * Remove outdated comments * k8s/script: allow parallelizing custom script without clear-containers (Azure#2067) Signed-off-by: Jess Frazelle <[email protected]> * Improving IP address assignment for master nodes with Azure CNI. (Azure#1966) * Azure cni static ip change (#1) * VSTS#1828538 Modified master IPs from dynamic to static and made agent nic dependent on master nic * Modified firstconsecutive static IP for default vnet * updated documentation specific to Azure CNI * Fixed styling test errors * Handled no master scenario * updated one of the examples cluster config to use firstConsecutiveStaticIp from the start of subnet * removed azureCNI check so that agent nic will always depend on master nic * Specified dependency of agent nic on master nic in windows agents template * moving firstConsecutiveStaticIP away from the edge of usable address space * Update documentation to help customers how to specify firstConsecutiveStaticIP and ipAddressCount for master nodes. * Add support for Kubernetes v1.8.7 (Azure#2068) * added support for Kubernetes v1.8.7 // TODO: build/publish Windows image * fix unit test * fixed dirty windows 1.8.6 build artifact * Upgrade Azure CNI to 1.0.1 (Azure#2064) * Azure CNI version bump * s/conf/conflist * Adopt CIS Kubernetes Benchmark, Part 2: Controller Manager. (Azure#2066) * Jenkins soak tests (Azure#2028) * added soak test * added soak test name * add location to name * delete rg if re-provisioning cluster * fix typo * added wait flag * fix no wait arg * add err log * remove ssh key * remove unused vars * Revert "remove ssh key" This reverts commit 9041c1a. * terminology * delete files if deployment failed * remove soak test spec * Pass in soak cluster name * only generate ssh key once * add default for KeyGenerated * fix typo * Do not generate SSH for soak test * fix typo in comment * add dashboard test debug log * Revert "add dashboard test debug log" This reverts commit 4937282. * Add Enable Pod Security Option (Azure#2048) * Add PodSecurityPolicy * use helpers.IsTrueBoolPointer, delete EnablePodSecurityPolicy function and update defaultAPIServerConfig * latest dashboard for v1.8 and v1.9 clusters (Azure#2070) * latest addon-resizer for v1.8 and v1.9 clusters (Azure#2071) * update kube-dns for v1.8 and v1.9 k8s clusters (Azure#2073) * latest kube-dns for v1.8 and v1.9 clusters * k8s-dns-dnsmasq-nanny-amd64:1.14.5 * latest heapster for v1.8 and v1.9 clusters (Azure#2072) * update pause image for v1.8 and v1.9 k8s clusters (Azure#2074) * latest pause image for v1.8 and v1.9 clusters * omitted v1.9.1 * changed wording of error (Azure#2089) * update Ubuntu image (Azure#2079) * Update docs for --feature-gates (Azure#2081) * re-enable read-only port on kubelet (Azure#2091) fixes heapster connection issues * revert addon-resizer version update (Azure#2090) * revert heapster version and re-enable kubelet read-only-port * revert addon-resizer to 1.7 * isolated bug fix to addon-resizer version * Add support for k8s 1.9.2 (Azure#2092) * add support for k8s 1.9.2 * updated windows zip * revert to 1.7 for addon resizer * Extend windows os drive size when customized OSDiskSizeGB is used (Azure#2097) * Adopt CIS Kubernetes Benchmark, Part 3: Kubelet (Azure#2098) * Restore KubernetesConfig sans struct embedding (Azure#2108) * restore properties to KubernetesConfig * lint * comment * rebase errata * Kind should be only EncryptionConfig for encyption-config.yaml (Azure#2104) * Kind should be only Config * removed the wrong Kind entry! * remove redundant apiVersion * running CSE provisioning script in foreground (Azure#2113) * Add autoscale test to E2E (Azure#2096) * initial attempt at autoscale test * working autoscale test * only add add’l options if passed in * Adding 3 replicas for load tester deployment * wait longer and linux only * skip autoscale test for v1.9 clusters Azure#2114 * Add member update after restarting etcd (Azure#2118) * Remove SecurityContextDeny setting in API server admission control. (Azure#2125) * Update custom vnet doc (Azure#2128) * add Azure Active Directory Admin Group Object ID flag (Azure#2111) * we don’t want to see stderr when checking for provision.complete (Azure#2126) * only create cert files on master (Azure#2120) * only create cert files on master * master node provision script cleanup * Enable iptables forward for kubernetes (Azure#2139) * --authorization-mode=Node only if secure kubelet (Azure#2138) * --authorization-mode=Node only if secure kubelet * EnableSecureKubelet unit test errata, using defaults generally * Validate k8s versions for PSP (Azure#2145) * Update window binary build documentation (Azure#2147) * Update window binary build documentation * move background section * Upgrade docker-engine to 1.13.* for all Kubernetes clusters >= v1.7 (Azure#2144) * update 1.7, 1.8, 1.9 latest to docker 1.13.* * docker-engine update to 1.13.* for >= 1.7 clusters * remove add 3 hours from timestamp, add’l \n (Azure#2149) * remove add 3 hours from timestamp, add’l \n * more \n * language * Add regression tests for 3 masters & 5 masters (Azure#2154) * added multi master configs * fix fmt * Mount in /var/lib/cni from the host. (Azure#2165) * Kubernetes E2E: test addons if present (Azure#2156) * conditional addon tests * uses generated model to introspect cluster features * I heart output * deployment flows need expanded cluster definition * reverting to ClusterDefinition for node counts * standard stdout implementation for all commands * typo * disable broken chmod command * stdout tweaks * retrieving deployment error details during upgrade (Azure#1995) * analyze and return deployment status during upgrade * added unittests for DeployTemplateSync * adding francecentral to azureconst generative script (Azure#2164) * add francecentral (Azure#2167) * validation error if custom VNET + Windows (Azure#2168) * Allow 1 core master node VM sizes (Azure#2173) * Allow DS1 Master VM sizes * 1 core for K8s masters * revert constants * update azure consts (Azure#2179) * update prometheus-grafana addon (Azure#2183) * Update clusterdefinition.md (Azure#2171) Fix apiserver options table markdown * Adding ServiceNodeExclusion as a default flag for Controller Manager (Azure#2180) * remove francecentral (Azure#2193) * improve networkpolicy documentation (Azure#2170) * Protect etcd tls from race conditions (Azure#2160) * chown etcd for keys in custom script * remove certs complete * add longer retry for etcd * remove second retry cmd * fix retrycmd_if_failure * add retries * passwd -u “etcd” * fix redirect output * pulling down provision logs during e2e runs (Azure#2190) * pulling down provision logs during e2e runs * setting no deployment retries by default * remove /opt/azure/containers/setup-etcd.sh logs * logs = finding bugs! * deleting deployments in resource groups (Azure#2195) * added kubernetes version validation for managed clusters (Azure#2194) * fix-api-server-bind-address-flag - Fixes a flag typo in the api serve… (Azure#2192) * fix-api-server-bind-address-flag - Fixes a flag typo in the api server yaml * fix-api-server-bind-address-flag - update docs with --bind-address fix * Update doc entry for bind-address flag * Network validation checks during provision (Azure#2196) * Add DNS + HTTPS checks, capture DNS packets * ARM doesn’t like ‘{‘ * standardizing retrycmd_if_failure usage patterns * Adding DNS pre-check for aptdocker.azureedge.net * tracking time for each retried provision event * standardizing to 3 masters api model for e2e tests * retain e2e resources for debugging * getting metrics logs from all cluster hosts * improved master/agent host retrieval * lint * lint * Adding “agent” substring to e2e api model pools * invalid agent pool name * revert agent forwarding ssh config * restore cleanup * add agent dns validation * 5 seconds between etcddisk mount retries * Fix DC/OS release version (Azure#2197) * update ApiServerConfig customization/override example (Azure#2201) * idiomatic windows e2e definition (Azure#2206) * don’t abort errors during log gathering (Azure#2207) * Make sure --cluster-dns uses DNSServiceIp set in KubernetesConfig and not always default value (Azure#2078) * Updated broken links (Azure#2208) * E2E nginx outbound access test: simplify port test (Azure#2204) * replace curl with nc * what does output look like on mismatch? * We are testing for outbound internet access not web content matching * testing curl for err * real tests, and not installing curl a bunch of times * don’t cleanup k8s e2e clusters (Azure#2210) * Add additional v3 vm sizes permitted for dcos master and agent (Azure#2184) * Cloud init improvements (Azure#2203) * chown etcd for keys in custom script * remove certs complete * add longer retry for etcd * remove second retry cmd * fix retrycmd_if_failure * add retries * passwd -u “etcd” * fix redirect output * remove extra lines * ignore warnings for etcd user changes * Parametrize retry cmd * removed unused data dir * use etcd args * Revert "use etcd args" This reverts commit ccbff6d. * parametrize sleep * changed the retries to 120 for network stuff * Remove Agent NICs dependency on Master NICs during upgrade. (Azure#2213) * replaced apierror with armerror (Azure#2205) * replaced apierror with armerror * addressed comments * addressed comments * reverted change in pkg/api/types.go * Kubernetes provision script: check for kubectl and docker files (Azure#2211) * unnecessary add’l systemctl enable * generalize ensureFilepath * fail provision if etcd check fails * rationalize azureconst (Azure#2215) * e2e ssh cleanup (Azure#2216) * return nil error on successful deployment (Azure#2218) * explicitly check aptdocker.azureedge.net (Azure#2220) * Add more etcd setup visibility (Azure#2214) * add etcd setup log to artifacts * remove hiding useradd output * show output of user add * add check for etcd user * add default audit policy (Azure#2189) * add default audit policy * apiserver audit log rotation is user-configurable * add nc checks to agent (Azure#2221) * Enabling Azure CNI for Windows (Azure#2174) * enabling azure cni * delete overwrite * address comments * address comments * fix kubeStartStr * fix kubeStartStr * remove misc files * squash commits for kubeStartStr * passed final test * rebase cleanup * setting Azure CNI for vlabs only * default back to kubenet * more set -x (Azure#2224) * more set -x * send ps to background * timestamps * adding certs dependency in cloud-init * rationalize etcd certs dep * extra ensure_etcd_ready * fixed version checking for managed clusters (Azure#2226) * Enabled preprovisioning on windows dcos agents (Azure#2228) * retry get aptdocker gpg key many times (Azure#2229) * Keyvault etcd certs (Azure#2155) * Use single values for etcdpeer key params * fixed param logic and added logic to vars * remove unused code * only add master certs/keys to params and vars if master is not hosted * move apiserver cert * add master profile != nil check * undo move api server key * Enable cloud controller manager support for 1.9 * Remove debug binary * adding debug to gitignore * minor doc fixes * Fix azure cni service ip (Azure#2237) * enabling azure cni * fix Azure CNI service IP connectivity * fix --auto-suffix when dnsPrefix is defined in apimodel json file (Azure#2239) * E2E: don’t collect logs if soak test (Azure#2240) * don’t collect logs if soak test * this! * Kubernetes 1.9.3 support (Azure#2242) * Add version 1.9.3 * update win zip and re-fmt * Kubernetes 1.8.8 support (Azure#2243) * add k8s 1.8.8 * updated win zip * rebase errata * more rebase errata * Update kuberneteswindowssetup.ps1 for azure cni to remove redundant code (Azure#2244) * enabling azure cni * remove redundant line * Windows RS3 hot fix for k8s (Azure#2230) * wait for certs to start etcd stuff in cloud init (Azure#2245) * set addon enabled value if nil (Azure#2254) * update generateproxycertscript.sh to use secure etcd endpoint/certs (Azure#2252) * enforce apt-get update warnings/errors retries (Azure#2241) * enforce apt-get update warnings/errors retries * Add single quotes around sp secret (Azure#2255) * --use-service-account-credentials=false if no rbac (Azure#2253) * new ubuntu image (Azure#2259) * Kubernetes Tiller Addon: configuration to set max-history (Azure#2217) * Add max-history configuration to tiller addon. * Test for max-history configuration for tiller addon. * freshen go-dev image (Azure#2261) * freshen go-dev image * lint * - keeping original DeploymentOperationsListResult in DeploymentError (Azure#2266) * - keeping original DeploymentOperationsListResult in DeploymentError - add DeploymentValidationError to distinguish validation errors * addressed comments * untangle —authorization-mode from enableSecureKubelet (Azure#2267) * untangle —authorization-mode from “secure kubelet” * fix typo * fix monitoring extension and add support for prometheus v2 (Azure#2257) This commit includes the following changes: - fixes the broken monitoring (prometheus/grafana) extension - makes this more resilient in the future, as the chart versions are now static (future to-do item would be to have extensionParameters override these versions) - gives the user and contributor more flexibility by allowing them to pass in a custom url for the prometheus chart values config (this is primarily important for developing and testing away from the Azure/acs-engine repo) * enable AggregatedAPI's by default for k8s 1.9.0+ (Azure#2264) * E2E test - 50 nodes (Azure#2260) * E2E: cleanup legacy kubernetes (Azure#2275) * add e2e hybrid definition also remove tiller explicit config from windows api model * removing windows + hybrid from legacy e2e * removing tests from legacy e2e that are elsewhere * add rescheduler, remove more from legacy e2e * add debug for service URL content mismatch * kubelet —cluster-domain is user-overridable (Azure#2276) * api/vlabs: fix typos in tests (Azure#2280) Signed-off-by: Jess Frazelle <[email protected]> * add prerequisit to have permissions to create service principals in the subscription (Azure#2281) acs-engine hangs with "WARN[0008] apimodel: ServicePrincipalProfile was empty, assigning role to application..." if user does not have enough permissions to create and assign service principals ans azure applications * E2E: service LB validations and pod Ready/NotReady (Azure#2279) * debug output if service URL validate error * debugging num retries * rearranging deck chairs * service validate should guarantee service IP * this actually works * improve pod Ready/NotReady check * general hpa foo (Azure#2291) * more time and avoid nil panic (Azure#2289) * New etcd versions and update default to v3.2.16 (Azure#2292) * new etcd versions and set default to 3.3.1 * using 3.2.16 as default * More e2e tests (Azure#2277) * add features on and features off * fix off model * add seperate tests for each feature disabled * move features off dir * rbac bool * added clear containers * added addons enabled test * fix typo in apimodel * remove aci-connector * move addons to default * Don't display "Error: <nil>" on successful deployment (Azure#2300) * E2E Addons (Azure#2294) * add features on and features off * fix off model * add seperate tests for each feature disabled * move features off dir * rbac bool * added clear containers * added addons enabled test * fix typo in apimodel * remove aci-connector * wip add mem/cpu limits/requests checks * add resources to container spec * fix resources type * add checks to tiller * remove extra err var * add check for dashboard and aci connector * update default definition * fmt * fix typo * Refactor resources validation * fix error string * fix linter * remove pointer * fix ineffassign * small fixes * ensure docker installs before ensure docker runs (Azure#2305) * Save apimodel after upgrade (Azure#2306) * cmd/deploy: Handle error due to missing permissions during deploy (Azure#2297) * Handle error due to missing permissions during deploy * CreateRoleAssignmentSimple can already return an error. Use this if a status 403 (not enough permissions) occurs. This is opposed to status 404 that seems to be issued to signal work in progress during service principal generation (by arm). * autoFillApimodel: remove the duplicated retry logic of CreateRoleAssignmentSimple. this allows to properly fail if CreateRoleAssignmentSimple returns an error * style fix: gofmt -s * Clarify that only Calico supports K8s network policies (Azure#2270) * using --cluster-domain for kube-dns domain (Azure#2303) * set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable (Azure#2310) * set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable * update docs * Updates NVIDIA drivers installation (Azure#2219) * updated NVIDIA drivers installation * linting engine.go * update GPU doc * Revert static IP allocation logic in Azure CNI, PR 1966. (Azure#2315) * add restarts to nvidia drivers download in cloud-init (Azure#2316) * add restarts to nvidia drivers download and only create cloud-init string if necessary * add tests * add v1.8 gpu-enabled api model for e2e testing * trying Standard_NC6 * e2e * lint * updated comment * bad match string, less freq checks, - unused func * more general success determination, typo * more typo * Support multiple AcsEngineClientIDs (Azure#2293) * Support multiple AcsEngineClientIDs * Fix acsEngineClientID assignment * Fix formatting azureclient.go * Fix2 formatting azureclient.go * docs cruft (Azure#2321) we are not actually setting —read-only-port=0 for kubelet * Use FirstConsecutiveStaticIP in original API model instead of resetting it to default during upgrade. * fix quotation in etcd daemon args (Azure#2325) * remove Windows + custom VNET validation error (Azure#2322) * Add isUpgrade flag. * Apply same logic to other routes setting FirstConsecutiveStaticIP. * Reboot etcd fix (Azure#2329) * fix quotation in etcd daemon args * Revert "fix quotation in etcd daemon args" This reverts commit 606bab4. * fix reboot by adding systemctl enable service * Remove agent NICs if upgrade master nodes. * Private clusters (Azure#2326) * add isprivatecluster func * wip remove load balancer for PC * add enablePrivateCluster flag * no public IPs for private cluster * working private cluster for 3 masters * remove duplicate iptables cmd * remove useless function * fmt * revert dnsprefix docs change * undo etcd change Move change to a separate PR because it is unrelated * remove masterPublicIpAddress * fix typo * handle DCOS + swarm nil case * add docs and example * replace host by jumpbox in the docs * add instructions to create jumpbox * indents * missing import (Azure#2348) * Allow "v" prefix in orchestrator version and release (Azure#2344) * fix quotation in etcd daemon args * Revert "fix quotation in etcd daemon args" This reverts commit 606bab4. * trim v in orch ver / rel * add unit tests * Improve the instructions for AAD. (Azure#2330) * Improve the instructions for AAD. * broken link and syntax * typo (Azure#2342) * Fix master resources merge conflict (Azure#2353) * apply azure CNI static IP revert * Improve info to get issuerurl (Azure#2356) * circleci: compile as separate step (Azure#2350) * Improve code blocks (Azure#2335) * Update Azure Gov ACSEngineClientID (Azure#2352) * add ClustrRole & ClusterRoleBinding for azure file (Azure#2238) add as cluster-service for azure-cloud-provider * mount /sbin/apparmor_parser if PodSecurityPolicy is enabled (Azure#2320) * mount /sbin/apparmor_parser if PSP * this is the correct kubelet service file * this is the correct sed command * /sbin/apparmor_parser already exists * Allow a default k8s version for loading agentpool-only clusters (Azure#2357) The defaultKubernetesVersion argument will be used if Properties.KubernetesVersion was empty. * Private clusters iteration 2: change the server for the cluster kubeconfig (Azure#2354) * modify 2nd kubeconfig for private clusters * typo * fix customscript kubeconfig * revert change in custom data kubeconfig * update docs for private clusters (Azure#2363) * Remove 1.6.x upgrade tests. (Azure#2364) * Add k8s 1.7.13 support (Azure#2369) * add version 1.7.13 * update win zip * Resolve merge conflict in building 1.7.13 (Azure#2370) * Resolve merge conflict in building 1.7.13 * Add comment * Set custom UbutuImageConfig for gov (Azure#2375) * Fix guid validation (Azure#2373) * metrics server addon (Azure#2339) * metrics server addon * use addonmanager mode EnsureExists * fix labels on metrics APIService * enable hpa autoscale test for 1.9 clusters * Reuse GetCloudTargetEnv in FormatAzureProdFQDN (Azure#2376) * reuse GetCloudTargetEnv in FormatAzureProdFQDN * Fix FQDNFormat lint error * minor fix in build script (Azure#2379) * rationalized vendor/ (Azure#2390) * remove unnecessary hyperkube reference (Azure#2391) * Remove deprecated '--require-kubeconfig' for k8s (Azure#2365) * Remove deprecated --require-kubeconfig * adding —require-kubeconfig back to 1.7j * less than is what we want here * Pass --location to containerService (Azure#2381) * Notes on day-to-day operations on an acs-engine cluster (Azure#2351) * Notes from my experiences.. .. over the last couple of days * Rename day-two-operations.md to kubernetes-day2-operations.md * add etcd certs to KV docs (Azure#2396) * upgrade tiller to 2.8.1 (Azure#2397) * remove k8s 1.5 related code/artifacts (Azure#2394) * blocking cse on cluster nodes ready (Azure#2225) * blocking cse on cluster nodes ready * deal with agent-only clusters * use kubectl var and ignore stderr * increase node active check timeout to 30 mins * test single master node Windows clusters (Azure#2402) * a miss * fix 2 more missed error * remove unnecessary * remove more unnecessary
Current DCOS templates are not allowed to use the VM sizes like D2s_v3. We very much need those.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Release note: