-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hold off and redesign Declarative Intent #70
Comments
+1 I agree with the language of intended vs expected and that there needs to be great consideration given to how these two are integrated. |
i think this is a bit too much of a semantic discussion 'expectations' vs 'intentions' ... you 'expect' things to result as you 'intend'. In any case this already boils down to what the main project has always advocated, 'declare' vs 'mandate' or that we phrase options and develop actions in a way that is declarative over the traditional imperative. So anything in a 'state=present' is expected AND intended to have those values, i see no need for an 'expect' extra keyword as all modules should already 'declare expected/intended state'. |
@bcoca This is why this discussion needs to happen. Configuration state is not operational state, and thus the confusion. Please do not dismiss this as semantic discussion. |
@bcoca: Except that in this case state and txrate are outside of our control. We cannot control them, hence, it is not an intention. The documentation states "Ansible is not a monitoring tool" but that's what these parameters are really for, to monitor if these values are exactly as we expect them to be. (And yes, the use of state is highly confusing in these examples, which makes it even worse overall, but a good example of all the things that are wrong with this functionality.) |
@itdependsnetworks that is the one part I see having merit as 'intended' vs 'active' state is normally not as prevalent in servers as it is in appliances, i.e. normally we restart services as soon as possible to remove the difference. @dagwieers if you are using parameters for comparison and not in an effort to affect the actual state, I would argue that is an incorrect usage of |
@bcoca So you agree with me, without knowing it :-) There are 3 options:
Now it is bolt-on without understanding the consequences, long-term vision or cross-domain insights. |
@bcoca I think the networking space is largely different, and your word carries a lot of weight, so I expect "not as prevalent in servers as it is in appliances" to be true often. Whether it is this or calling filters that I use on the daily as "extremely specialized" ultimately can undermine ansible producing the best product possible. |
@bcoca From the DI documentation
So it's more like an included assert into the module. We might as well do: - name: configure interface
net_interface:
name: GigabitEthernet0/2
description: public interface configuration
enabled: yes
assertions:
link: connected
tx_rate: 10gb And make this directive part of every module. Would make writing integration tests easier :-) (i.e. no need to register and program conditionals in failed_when) |
@itdependsnetworks I don't agree, even on servers there are states that we cannot influence directly. The state of a cluster, the network link or negotiated speed, ... anything that is a given and could be changed outside of our control is worthwhile to validate. The question is if this should be some special parameter, or a return value that we can check as part of the task, or a separate assert-task (as we do with integration tests). |
I think my point was missed, that we are getting stuck in the semantics vs the 2 real differences we have here.
|
@bcoca I think the people joining on this thread so far in agreement that there is an issue with creating a "check against" type and are trying to find a better solution that
Not to give any preference to @dagwieers suggestion, but I do see where the |
my preference is to just use include_role/tasks to do the same operations as this is basically 'get facts + compare/assert' .. and I know they are not w/o problems but I would prefer to put the effort to finish bringing them up to spec vs creating a whole new breed. |
Just a quick historical note, DI was conceived and much of it built before we had persistent connections. It was needed to overcome some of the limitations we had with the connection module (pre-persistence). Now that is no longer the case, we have the opportunity to re-consider this from the ground floor up. I think I would be more in favor of @bcoca suggestion of better facts + asserts long term. |
@privateip This is confusing me, because we announced this at AnsibleFest SF last week for the complete audience as part of the v2.4 innovations in the networking space. So are we announcing things we already know we don't want to do going forward ? |
No the work was done and we decided to move forward with it for networking modules. My comments are in the context of considering a more broadly implementable solution across all of Ansible. |
That's what I think is not a good idea. I don't think we want different Ansible concepts depending on the related technology domain. It's too confusing IMO. |
Agreed ... if we adopt an Ansible wide solution to this I'm fully on board to refactoring the network modules to align with the strategy. |
The Core Network team has agreed that we will not add the technical guide (ansible/ansible#26954) for Declarative Intent into 2.4.0 to avoid confusing the end users. |
@bcoca Can you please elaborate on include_role/tasks approach that you proposed in above comment. |
@ganeshrn basically, just use tasks to do the same thing vs having to encode it into action/module. |
I like both bcoca's ideas and dag's ideas at the moment. bcoca's idea could be done without code changes. But it means we need to start shipping and supporting pre-baked roles. What does that look like? dag's idea is a much nicer UI than the current declarative intent proposal. It requires changes to the playbook language. However, there's no information on implementation so there's more design work to do before we could spec out just how much work this would be (and the caveats). |
The main debate seems to center around creating new functionality built into modules, relying on existing capabilities that can be handled by registering the output of a task and testing it with another task, or coming up with an elegant way to leverage existing capabilities in Ansible but with a better UI for modules. Validating the operational state of a system after Ansible makes changes has always been possible, but it's not done as commonly on non-network operating systems. Just trust the module to change the system, and trust the system to change its state. For the times when that doesn't get the job done (problematic services that don't really start or DBs that take a while to become ready), we have We have the trappings of these type of things already with I think this could be a shorthand/alternative syntax for Etherpad for reference/real time collaboration. |
I would say modules already should check that the desired state is achieved, in most cases this is done by checking the 'success' of the tool/api/function call invoked and trusting THAT. What they don't normally do is arbitrary 'extra checks' outside of the commands/API they use i.e The I'm remiss abgout adding something like this built into the language as it will always be more limited than what a 'full fledged task' can do. Even something like an 'assert strategy' that can interject tasks before/after execution of each task should be a lot more to use and implement (openshift/stack has one already). |
@bcoca I agree with most of what you say, my only objections (and for me trumps mostly everything else) is convenience and complexity. Writing a role per task for this seems inconvenient, even if these roles already exist (for every network device ?) there is a burden (selecting, trusting, managing roles) and the complexity understanding and troubleshooting issues when they arise. If you abstract away through roles, and it fails, the error throws you into that complexity. And the great strength of Ansible is that it makes hard/complex tasks easy through a simple interface. Do we move the complexity into roles (user-facing) or in the core (abstracted away in a single task) ? If my example was too limited compared to assert that's solved by doing this: - name: configure interface
net_interface:
name: GigabitEthernet0/2
description: public interface configuration
enabled: yes
assertions:
- this.link == 'connected'
- this.tx_rate == '10gb' True, it doesn't offer the same as one or more tasks could offer, but maybe it fits 80% of the use-cases, and the remaining part can still add the required tasks. |
Specific example of situation where the result of the module may not be what we want: when we create an RDS instance, we have the option to wait for a timeout. If we do wait then sometimes the RDS instance is up and running as expected. Sometimes it takes longer, up to over 10 minutes. It would be very useful to have the actions like creating a database in the DBM instance happen at the point when the RDS has finally started working. I currently do exactly that: create the instance without waiting, then do some other slow actions (create ec2 instances) then come back and wait for the instance to be up before continuing with further dependent configuration. |
My proposal would be something like: - name: check that interface is ready
assert:
module: net_interface_facts
link: connected
tx_rate: 10gb
retry_timeout: 200
assertion:
- "succeeded|true"
- name: configure interface
net_interface:
name: GigabitEthernet0/2
description: public interface configuration
enabled: yes The "assert" (not sure if it should share a name with normal assert, I currently think so) becomes a plugin that runs other modules. The modules can be fact modules or standard modules run in check mode. It keeps retrying until it has success or times out. By running it without retry it simply fails and stops the script. This can then be used with any existing module which supports check mode (or is a proper facts module) and at the same time solves both the verification and the make sure it's happened cases. |
@abadger, @dagwieers one of the advantages of implementing this in roles/includes is that users don't need a programmer to extend or add these behaviors, a few 'reference implementations' is all that is needed for a much wider audience to start producing coverage for more cases. That said, looking at dag's example, yes it is nice and concise, but i would argue that it introduces many 'uncontrolled' questions as:
I would argue there is a simpler way to do the same thing, which just needs some extra support in modules, I'll use systemd as an example as it already returns 'state after action' information in it's response: - name: restart tomcat
systemd:
name: tomcat
state: restarted
enabled: yes
failed_when: # basically a 'reverse assert'
- result.ActiveState not in ['Active', 'Activating']
- result.UnitFileState != "enabled"
- result.UMask != "0022" Like with @mikedir 's 'assert wrapping modules' , this would require minor modifications to modules to return the 'after action state', probably with a toggle as this might be pretty large and time consuming in some cases. If we really need to 'build in assert' I believe this is probably the most effective way and should be less onerous on users as it does not duplicate |
+1 @bcoca proposal. We are essentially doing a form of this today with |
@bcoca This is very close to what I am proposing. The difference between It would mean you can control how ansible-playbook behaves by default, and on-demand. (See discussion above about this being the state of the cluster, or the state of an interface link).
This example expects the information to come from the module. So yes the module exposes this additional information relating the object being managed. (In case of network modules, it would implement the required delay to intercept possible changes related to the task, just as it was with DI)
It is still one task, so yes I guess.
That is up to the user if he wants this to be a possible failure case or not. I can see use-cases for both fail on assertion failure, or continue with the playbook.
Either it works or it fails (and the user wants it to fail). If the user does not want to fail on assertions, exposing a warning seems useful.
Would work because it's just a task with some additional handling on exit. |
BTW I always wanted to make the proposal to introduce this as the default registered variable for the task result values. Which you could use with changed_when, failed_when and now assertions. - command: foo bar
changed_when: this.rc in [2, 3]
failed_when: this.rc >= 4 Avoids having to register and is much cleaner if you don't need to expose it to other tasks. |
@dagwieers so i think mixing up 3 features takes us nowhere:
|
@bcoca We need a way to influence whether the assertions fail or not at run-time. Sometimes you want to test the operational state as part of the playbook (because you want to verify things are exactly as you like), but at other times you want to perform all tasks regardless of the operational state, because you can fix those after running that playbook. This means we need a run-time flag to influence the behavior (fail, warn or ignore ?), and a default behaviour.
|
Talking about use-cases and user interfaces makes perfect sense to get to something useful., it doesn't matter we are building new concepts that are useful to a wider audience. Not sure why you consider this taking us nowhere. Kind of condescending, especially if you ignore half of what I am saying and adopt the other half :-) I will make the proposal for this and we can consider it as a given here. |
I'm not sure how that is condescending, I'm just pointing out that if you keep adding features on top of the existing discussion, it will extend infinitely, discussing each feature separately will limit the scope and length of such discussions. |
You can't design an equivalent solution for DI by discussing each "feature" independently. That will bring us nowhere, because we don't know yet what we want. (Look at how this evolved from top to bottom). The condescending part includes you ignore half of what I am saying. No, ignore_errors does not help controlling these assertions at run-time. It does not separate the stuff that is a real module failure with operational states. Take the original example: - name: configure interface
net_interface:
name: GigabitEthernet0/2
description: public interface configuration
enabled: yes
assertions:
- this.link == 'connected'
- this.tx_rate == '10gb' You may want to run it with the behavior that a failed assertion may be ignored, or should be a warning. (Some flag, I don't know how to name it In your case: - name: configure interface
net_interface:
name: GigabitEthernet0/2
description: public interface configuration
enabled: yes
failed_when:
- this.link != 'connected'
- this.tx_rate != '10gb' What if there is no link on one interface and you're running this in production and a lot of important configuration follows. What to do if this an urgent operation in a maintenance window. Sure the fact that we have a link down is something that needs to be looked at, but does it warrant stopping the complete execution ? I don't think so. A bad cable should not block you from performing other actions. So let's add ignore_errors in the playbook, now it will continue even if the module would fail hard for things we do control and should work. Operational states are a different beast than configuration items we control ourselves. And may need to be handled differently. |
well, the ignore_errors is also templatable and can handle complex conditions, but not need to go that far when the failed when can also do same. failed_when:
- assertions_fatal|bool AND ( this.link != 'connected' AND this.tx_rate != '10gb') OR this|failed in any case your examples did not show an additional 'toggle' for assertions being fatal or not, which should not be lmited to a global scope as you might want to handle that a block/role/include/task level also, so you need another keyword for that too. |
Right, but then it becomes an inconvenient and very custom thing. Again, that's where designing this as a new concept in core makes it easier and valuable for people to use. That's how we evolved from custom shell scripts, to templated playbooks, to using Jinja2, to proper YAML playbooks and various neat features. The same reason why I think using roles, while technically possible, is inconvenient and overly complex. The proposal is still growing as we discuss, and it depends on naming, functionality and use-cases. Whether we want to handle this per block/role/include/task is something to discuss. My use-cases wouldn't require it, but I don't want to force my use-case one everyone and wouldn't mind if this is configurable at every level. (Remember this was all to find an alternative and more versatile solution for Declarative Intent, so I would like to learn from the Networking people how they would be using this before making any decisions. However I can see myself using this in playbooks and for integration testing.) |
Agreed with @dagwieers it is certainly possible to do this with roles, just inconvenient and overly complex. |
i would argue it is less complex with roles, you are just shifting the amount of people that can handle this from people that can write/plays roles to developers of the modules and core engine. No matter which avenue we need to write this for ALL the OSs/functions/etc, i find that creating 'reference roles' as I mention above will get us there a lot sooner than 'reference modules' and core features. |
Also collecting pros/cons for each solution in https://public.etherpad-mozilla.org/p/ansible-proposal-70 |
I think one thing to remember is this functionality already exist, @bcoca is your intention to remove this functionality? |
Would would be useful is if everyone[1] please describes what they believe the problem we are trying to solve is [1] Yes, that includes you |
As a network engineer running a Playbook , how can I ensure that at the end of a Playbook run my network is configured correctly. Given that my network contains things that Ansible can not configure (external factors), such as ensuing that a physical cable is plugged in and links to switches and that link is UP. If a check fails (cable unplugged) I want the ability to either roll back configuration or fail at that point |
1 - Updating the operational state to be a dict while keeping the same functionality |
All, |
I'm going to drop my thoughts:
How about something like:
That way we could use failed_when as the current 'fail in case this list of assertions fail', but we also augment it to finer grain control to say 'ok, in general i don't want to stop play if asserts are false just warn (which message could be configured in an assert by assert basis) BUT treat each one of the assertions individually as they may change behaviour as needed'.
|
A few of things about 'roles approach':
So i would dedicate a short time to creating a couple of these roles, share them on galaxy and see what happens. Even if you want to build this into the core language and update every module to support it, the roles would still work regardless and provide feedback from users in a way that is quick and easy to respond to. |
Also, the failed_when could expose a delay param, that would effectively mean 'wait $delay time before gathering facts and checking assertions'.
|
@dag This is no longer being considered a viable implementation. We are moving to roles to handle this implementation which will be decoupled from Ansible core. Any objections to closing this proposal at this time? |
@privateip No, makes perfect sense. |
Proposal: Redesign Declarative Intent
Author: Dag Wieers <@dagwieers> IRC: dag
Date: 2017/09/15
Motivation
During AnsibleFest SF 2017 it was clear that the Declarative Intent functionality that was designed as part of needed ansible-network roadmap is flawed in a few areas.
This proposal does not question the merit of this needed functionality, but some design decisions require improvements before this is ready for wider consumption.
The implementation diffuses the configuration namespaces with the expectation declaration, which is going to be very confusing. Not just because some of these parameters are used elsewhere for configuration, but simply because everything is a top-level module parameter. This is bad user interface design. The remark that the documentation will make clear what parameters are part of the configuration declaration and what parameters are expectations does not make this acceptable.
The example from the documentation:
So state here is an expected state of the interface, and tx_rate is the expected transmit rate, not and actual declarative configuration or even an intention as such. A better interface would be:
Declarative intent seems to indicate we are declaring our intentions. But that's not what it does, it declares expectations rather than intentions. If we look at the above examples the items listed are not (as documented) the "Intended state", but rather the "Expected state" an interface is in.
"Intended state" seems to imply it is something we can control, where "expected state" better describes what it is. An intention is something one contributes to, an expectation is something we consider getting back. So the functionality a misnomer in itself.
From the documentation:
There is another problem with doing both configuration and expectation checking in a single module. If one of both operations fail, how would we know if the configuration failed, or the expectations were not met. And should a failed expectation be a module failure or not ? I can see different use-cases for testing expectations and as such, it would be more versatile of expectations would be handle by special tasks rather than being part of the existing modules.
Or maybe the core Ansible product should understand the difference between config and expectation and could be told to either ignore/warn on expectation failures, or have a different failure path. Rather than diffuse the handling of the module intrinsically. Maybe you don't want to run expectation tests in some runs (e.g. production updates), but you want them in other runs (like integration tests, or compliance testing).
This functionality is useful in a wider context than just networking, so this should have been discussed with use-cases outside of networking only. And this maps with the existing assert module and registering the return values. If we agree that doing a validation check as part of the call, we could leave this up to Ansible core to do the test (and take the appropriate action):
This then would make writing integration tests easier, and ensures that other modules would automatically have this functionality without requiring any special modifications. This is a much stronger use-case than the original implementation.
Problems
What problems exist that this proposal will solve?
Solution proposal
Documentation (optional)
Hold off with documenting and promoting this as part of v2.4, so we don't have to support this going forward in light of a user-breaking redesign.
Anything else?
The above feedback was provided during the AnsibleFest SF 2017 Network Working Group and was largely unanimous by participants.
This proposal was made to collect feedback from other participants in order to design a better user interface for this functionality before it becomes a wider standard.
Also collecting pros/cons for each solution in https://public.etherpad-mozilla.org/p/ansible-proposal-70 Please add your thoughts here and they will be rolled into this proposal doc by @gundalow
The text was updated successfully, but these errors were encountered: