-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto creation of template #639
Conversation
|
||
func (beat *Beat) loadTemplate() error { | ||
|
||
// TODO: Fin a way to check if flag was even set. If it was set but no path, default path should be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that just template != nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that doesn't work for string, ints etc. because of the default which is a string "". I don't think I can set nil as default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I was hoping it sets the pointer to nil if the flag is not used, but that doesn't seem to be the case. Maybe we'd be better with having the setting in the configuration file? I find that more friendly to packaging, config management, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. In general I also prefer to all the settings in the config file. The only "issue" here is that this should be done only once. But as it does not have any affect if it is applied multiple times, this should also not be an issue.
As this config belongs to the elasticsearch output, I suggest to also place it there. Something like
output:
elasticsearch:
#template: path
If template is not configure, default path is taken. If template is uncommented an no path is provided, template will not be loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsg One thing that came to my mind is that the template will be loaded multiple times in most cases anyways, as there are multiple beats on different machines ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewkroh That's a good idea. I actually have to check if this could be done with one call. As in case lets say 20 beats start up and every beat sends the template, it is quite likely that between a HEAD request and sending the template, and other one already did it.
Also the idea came up to use versioning for the templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get it right, this is also what LS does: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template We might want to check the source code to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the beats that are sending data to the same Elasticsearch have different versions and different templates? If only one beat is loading the template, it might be that an older beat is expecting an older template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to make sure all the beats are running the same version. Otherwise raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monicasarbu The idea here was to version the templates. But I still need to create our concept here.
I had a look at the template for logstash-output-elasticsearch and there would be two things which would have to be added:
Currently it is assume, that all templates start with logstash-*. We could recommend people who use filebeat with logstash to use the ls index to solve this issue as I think it is not possible to set 2 template patterns. |
reader := bytes.NewReader(content) | ||
|
||
|
||
client.LoadTemplate(config.Template.Name, reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of just forwarding as is, do we want to do some validation?
@ruflin I was thinking we'll just tell people to configure the Beats templates in the logstash-elasticsearch-output. When using the beats with logstash, you need a pretty special logstash configuration anyway. |
@tsg If we go this route it would probably make sense to move the more generic stuff from the logstash template also into the filebeat template. I assume people using LS will do some transformations etc., means having fields which are not part of our template yet. |
@@ -40,7 +40,22 @@ output: | |||
# Scheme and port can be left out and will be set to the default (http and 9200) | |||
# In case you specify and additional path, the scheme is required: http://localhost:9200/path | |||
# IPv6 addresses should always be defined as: https://[2001:db8::1]:9200 | |||
hosts: ["localhost:9200"] | |||
hosts: ["192.168.99.100:9200"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will be removed again. Only for testing.
@ruflin right, we might want to integrate pretty much all of LS template into Filebeat's template, also in preparation for the ingest node. This would be worth a new discuss ticket? |
logp.Debug("test", "Test: %v", config.Template) | ||
|
||
// Check if template already exist or should be overwritten | ||
if !esClient.CheckTemplate(config.Template.Name) && config.Template.Overwrite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition with CheckTemplate
and LoadTemplate
when using multiple Beats (potentially of different versions) but I don't know of any way to work-around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think there is a potential conflict. For different versions the solution is probably to version the templates and indices. I recently updated the issue and add a note about the version problem: #639 The initial solution would be to not make it automatic.
Regarding the versioning issue, it sounds to me like the only upgrade-safe way to do it is to encode the "schema" version in the index names, like Marvel does. I wonder if in Phase 1 we shouldn't load the template by default with Regarding the Logstash setup, how about option 3, Filebeat manages the template directly with Elasticsearch. Architecture wise, we said before that it's fine if in a *beat -> Logstash -> Elasticsearch, *beat queries Elasticsearch directly for configuration, topology information, own metrics, etc. I see the template management similar to that. I see a challenge with this option is how to organize the configuration file. |
I think it is a good idea to go in Phase with overwrite: false but having a warning / error message in log. The challenge with option 3 is as you mentioned the config file as it cannot depend on the elasticsearch output as it would then also send events to it. But as we have to think about this anyways with configuration and metrics lets add this to it. I like the idea and will check if I can come up with a good solution. |
We have decided to go with Phase 1 first and overwrite: false by default. PR will be updated. |
689f236
to
37dcc78
Compare
For changing the documentation I created a follow up issue here: #862 |
@@ -58,6 +58,7 @@ https://github.com/elastic/beats/compare/v1.1.0...master[Check the HEAD diff] | |||
- Update builds to Golang version 1.5.3 | |||
- Add ability to override configuration settings using environment variables {issue}114[114] | |||
- Libbeat now always exits through a single exit method for proper cleanup and control {pull}736[736] | |||
- Possibility to create elasticsearch mapping on startup {pull}639[639] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing to: Add ability to create Elasticsearch mapping on startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@dedemorton Doc adjustments done. |
@@ -151,6 +151,9 @@ filebeat: | |||
# Event count spool threshold - forces network flush if exceeded | |||
#spool_size: 2048 | |||
|
|||
# Enable async publisher pipeline in filebeat (Experimental!) | |||
#publish_async: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this option here by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was automatically generated, so I think it went missing in one of the previous updates.
LGTM. |
LGTM. The organization of a few of the imports don't match up with how goimports organizes them. |
@andrewkroh Will clean that up in a later PR. |
Problem
Each beat loads structured documents into elasticsearch. To make sure every document has the correct mapping, before starting a beat the predefined mapping should be loaded. Currently this is a manual step and is often forgotten. This can lead to problems as elasticsearch automatically assumes types. A mapping can't be changed anymore at a later stage.
See also https://github.com/elastic/libbeat/issues/62
Versioning
An additional problem is, that the templates can change with the different versions of a beat. This means having data of 2 different beat versions can lead to problems.
Logstash Template
Logstash provides its own template to elasticsearch as part of the logstash-output-elasticsearch. This is a generic template and does not necessarly cover all cases from the beats. The template applies to all indices starting with logstash-, by default the beats templates apply to all indices starting with beatname-
Proposed solution
To not have to implement all possibilities with the first version and better understand on how the feature will be used, I suggest to split it up in two phases:
Phase 1 - Manual phase / opt in
Phase 1 will be fully backward compatible. It allows the user to load the template if he configures it so. By default, the template will not be loaded.
Phase 2 - Automated / versioning
Phase 2 always loads the template on startup and adds versioning for each template, so no conflicts between the different template versions happen.
Configuration
The configuration will look as following. It is part of the elasticsearch output as a direct connection to elasticsearch is needed to apply the template. People using Logstash for example for filebeat must apply the manually or use the logstasth-elasticsearch-output.
This configuration is taken from Logstash: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template
Versioning
To support the different version of templates, each template must be versioned. In the first version, by default templates would not be applied. In a second version when versioning of templates is in place, templates should be loaded automatically. Versioning is required for the automated behaviour as otherwise this would lead very soon to conflicts. With versioning in place, it is possible to keep using older beats running and writing to indices with the old mapping and having a newer beat running at the same time, writing in new indices with the new mapping.
To solve this problem, it is probably not only required to version the templates but also the indices, so the correct templates can be applied. It should be checked with Kibana / Marvel / Logstash on how they are solving this problem.
Logstash
Logstash has with the elasticsearch-logstash-output its own plugin which already applies a template to elasticsearch. This template is a general template and is intended for all indices starting with
logstash-
. Especially filebeat is normally connected to LS and sends data to ES over the output plugin. Depending on the LS configuration, either the index pattern logstash- or filebeat- is used. The question is which mapping should be used when sending data over LS.Option 1
The disadvantage of option 1 is, that it only applies to filebeat and not the other beats. The assumptionis made, that the other beats send data directly to ES.
Option 2
The main disadvantage here is that the mapping has to be applied manually. People updating a beat will probably not apply the updated (versioned) mapping. The advantage is that it works with all beats.
Suggestion
I would suggest to go with option 2, as filebeat has one of the simplest patterns of all beats, means there will be very rare BC breaks and in the long term, also nodeingest could be used for filebeat.
Note
Similar to logstash, there should be a generic base pattern provided by libbeat which applies to all beats.