Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingester (and kafka) support #168

Merged
merged 14 commits into from
Jan 28, 2019
Merged

Add ingester (and kafka) support #168

merged 14 commits into from
Jan 28, 2019

Conversation

objectiser
Copy link
Contributor

@objectiser objectiser commented Dec 14, 2018

This is an initial suggestion for how we can easily switch from a simple production deployment (e.g. collector and query using a common backend persistent storage) to one where Kafka is used as a buffer between collectors (publishing to Kafka) and ingesters (consuming from Kafka and storing in the specified storage).

The difference between the two configurations would be:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200

and

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  ingester:
    options:
      kafka:
        brokers: 127.0.0.1:9092
        topic: jaeger-spans
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200

This is a simple change to the configuration, i.e. defining the ingester options, but not necessarily clear this results in a more complex architecture.

Issues to consider:

  • Both the collector and ingester would have storage definitions - whereas the CR only has one area to define these. We could list storage types, e.g. type: elasticsearch, kafka with options for both, but it is still not clear which should be used in collector and ingester.
  • We need an ingester node in the CR, equivalent to the other top level nodes, to be able to override the image - but these are just used to define information about the individual components. Not necessarily how those components collaborate with each other.

So the question is whether defining the ingester should imply using the collector->kafka->ingester approach and that the storage option therefore moves to the final component in the chain - i.e. ingester rather than collector.

If the existence of ingester options is not considered appropriate, then I would recommend we define more explicit strategies:

  • allInOne as now, single deployment with agent, collector and query
  • collector and query with shared storage
  • collector->kafka->ingester and query with shared storage

(what we name the 2nd and 3rd strategies is another matter)

I think this is actually the best approach, as it is clear what architecture the user wants, and the CR can then be verified to ensure it has all of the required information defined for each of the components. It also means that the ingester node is optional - as the kafka options could just be listed in the storage.options, but would still result in the ingester being configured.

Feedback on the current PR and proposed change would be appreciated - however I might not work on it much until new year.

(Also requires e2e tests)

Signed-off-by: Gary Brown [email protected]

@jpkrohling
Copy link
Contributor

This change is Reviewable

@codecov
Copy link

codecov bot commented Dec 14, 2018

Codecov Report

Merging #168 into master will increase coverage by 0.24%.
The diff coverage is 98.83%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #168      +/-   ##
=========================================
+ Coverage   96.46%   96.7%   +0.24%     
=========================================
  Files          30      32       +2     
  Lines        1469    1638     +169     
=========================================
+ Hits         1417    1584     +167     
- Misses         40      41       +1     
- Partials       12      13       +1
Impacted Files Coverage Δ
pkg/apis/io/v1alpha1/jaeger_types.go 100% <ø> (ø) ⬆️
pkg/deployment/collector.go 100% <100%> (ø) ⬆️
pkg/deployment/ingester.go 100% <100%> (ø)
pkg/strategy/controller.go 100% <100%> (ø) ⬆️
pkg/strategy/streaming.go 96.36% <96.36%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c3b92f...c21b0c2. Read the comment docs.

storageType := c.jaeger.Spec.Storage.Type
// If ingester options have been defined, then change storage type
// to Kafka, and the storage options will be used in the Ingester instead
if len(c.jaeger.Spec.Ingester.Options.Map()) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, do we assume ingester options have to be provided?

Or in other words are not we supporting the case where no conf props are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the easiest approach, as I believe ingester options will always be required - but is not intuitive. As mentioned in the PR description, I think a better approach would be to have more distinct strategies that effectively define the architectural structure being used. So all-in-one, another for the current production setup, and a new third option used to represent the kafka based approach.

dep := collector.Get()

envvars := []v1.EnvVar{
v1.EnvVar{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: types can be omitted here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applies to other places in this repo as well


// NewIngester builds a new Ingester struct based on the given spec
func NewIngester(jaeger *v1alpha1.Jaeger) *Ingester {
if jaeger.Spec.Ingester.Size <= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to normalize func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but currently this is consistent with the other components. So should be handled in one go in another PR.

}

// Get returns a ingester pod
func (i *Ingester) Get() *appsv1.Deployment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about consistency here...

The other creators in this repo returns object not pointer. Maybe a better to standardize and include func which check if the ingester is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - but the other deployments in this package (query, collector, agent) all return pointer. So this is consistent with the other deployments at the moment - so if going to be changed, it should be done in separate PR for all components.

@objectiser
Copy link
Contributor Author

This PR is on hold until @jpkrohling returns so we can resolve how the operator should know when ingester is being used.

@jpkrohling
Copy link
Contributor

I trust your judgment about which approach is better. Initially, I would say that having the ingester options are not a problem, but I agree that it's cleaner to have the three options as you listed them. The main motivation as I see it is that we could implement new features specific for the strategy in a way that is transparent to the user. This means that we are making the operator more opinionated, which I see as a good thing.

About the naming, how about:

  • allInOne
  • production
  • productionHA (highly-available), implying that it's production + other stuff

@objectiser
Copy link
Contributor Author

@jpkrohling I was hoping we could move away from production (so deprecate it) and use two more appropriate names, as productionHA doesn't really describe the kafka based architecture, as the production strategy can also be HA.

So I think we are agreed that the strategy name is required for kafka/ingester - but just need to resolve the actual names to use.

@objectiser
Copy link
Contributor Author

@jpkrohling Created a new strategy called streaming which seems appropriate - so we could leave production for now, although in the future we may want to consider a rename, as it implies it is the only production ready architecture.

If this approach seems reasonable, then next step would be to add some e2e tests and docs.

@jpkrohling
Copy link
Contributor

+1 from my side!

README.adoc Outdated

The agent can be injected as a sidecar on the instrumented application or as a daemonset.

The query and collector services are configured with a supported storage type - currently cassandra or elasticsearch. Multiple instances of each of these components can be provisions as required for performance and resilience purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/can be provisions/can be provisioned/?

README.adoc Outdated

=== Streaming

The `streaming` strategy is designed to augment the `production` strategy by providing a streaming capability that effectively sits between the collector and the backend storage (e.g. cassandra or elasticaearch). This provides the benefit of reducing the pressure on the backend storage, under high load situations, and enables other trace post processing capabilities to tap into the real time span data directly from the streaming platform (kafka).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/elasticaearch/elasticsearch/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry should have re-read :)

@objectiser objectiser changed the title WIP: Add ingester (and kafka) support Add ingester (and kafka) support Jan 24, 2019
@objectiser
Copy link
Contributor Author

@jpkrohling @pavolloffay Ready for re-review. Will follow up with e2e tests in another PR. Have locally tested with strimzi.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

The e2e test is missing, but it's coming as part of a future PR.

Reviewed 3 of 10 files at r1, 4 of 11 files at r3, 6 of 10 files at r5, 2 of 4 files at r6.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @objectiser and @pavolloffay)


README.adoc, line 190 at r6 (raw file):

The only additional information required is to provide the details for accessing the Kafka platform, which is configured in a new `ingester` component:

[source,yaml]

Would it be too much to include a complete example here, instead of a snippet? If it's too much, then perhaps a note with a link to simple-streaming.yaml?


pkg/deployment/collector.go, line 69 at r6 (raw file):

	storageType := c.jaeger.Spec.Storage.Type
	// If sttategy is "streaming", then change storage type

typo: s/sttategy/strategy


pkg/deployment/ingester.go, line 27 at r1 (raw file):

Previously, objectiser (Gary Brown) wrote…

Possibly, but currently this is consistent with the other components. So should be handled in one go in another PR.

+1, this is being handled in another PR (the one that adds the Update capabilities)


pkg/deployment/ingester.go, line 31 at r6 (raw file):

	if jaeger.Spec.Ingester.Image == "" {
		jaeger.Spec.Ingester.Image = fmt.Sprintf("%s:%s", viper.GetString("jaeger-ingester-image"), viper.GetString("jaeger-version"))

This will create a conflict with #174


pkg/strategy/controller.go, line 39 at r6 (raw file):

	}

	if strings.ToLower(jaeger.Spec.Strategy) == "streaming" {

Could you create an issue where we replace all the ToLower(...) == ... with the EqualFold() that you used elsewhere in this PR?


pkg/strategy/streaming.go, line 36 at r6 (raw file):

}

func (c *streamingStrategy) Create() []runtime.Object {

Is this the same as the production + ingester.Get()? I wonder if it would make sense to reuse that. Perhaps not, as the code is very simple and it might be easier/more maintainable to duplicate it, but perhaps you've given some thought about it?

Copy link
Contributor Author

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @jpkrohling and @pavolloffay)


README.adoc, line 190 at r6 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

Would it be too much to include a complete example here, instead of a snippet? If it's too much, then perhaps a note with a link to simple-streaming.yaml?

Done.


pkg/deployment/collector.go, line 70 at r1 (raw file):

Previously, objectiser (Gary Brown) wrote…

This was the easiest approach, as I believe ingester options will always be required - but is not intuitive. As mentioned in the PR description, I think a better approach would be to have more distinct strategies that effectively define the architectural structure being used. So all-in-one, another for the current production setup, and a new third option used to represent the kafka based approach.

Now explicit 'streaming' strategy is used to determine whether ingester is required.


pkg/deployment/collector.go, line 69 at r6 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

typo: s/sttategy/strategy

Done.


pkg/deployment/collector_test.go, line 289 at r1 (raw file):

Previously, pavolloffay (Pavol Loffay) wrote…

applies to other places in this repo as well

Done.


pkg/deployment/ingester.go, line 27 at r1 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

+1, this is being handled in another PR (the one that adds the Update capabilities)

Done.


pkg/deployment/ingester.go, line 31 at r6 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

This will create a conflict with #174

We can sort out the conflict depending which gets merged first.


pkg/strategy/controller.go, line 39 at r6 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

Could you create an issue where we replace all the ToLower(...) == ... with the EqualFold() that you used elsewhere in this PR?

Done #181


pkg/strategy/streaming.go, line 36 at r6 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

Is this the same as the production + ingester.Get()? I wonder if it would make sense to reuse that. Perhaps not, as the code is very simple and it might be easier/more maintainable to duplicate it, but perhaps you've given some thought about it?

Did consider whether could make reusable - but it is slightly different as kafka is the storage in collector. So thought for now better to keep separate - but can be refactored later, especially if we introduce other strategies that share common elements.

Signed-off-by: Gary Brown <[email protected]>
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r7.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @objectiser and @pavolloffay)


pkg/strategy/streaming.go, line 36 at r6 (raw file):

Previously, objectiser (Gary Brown) wrote…

Did consider whether could make reusable - but it is slightly different as kafka is the storage in collector. So thought for now better to keep separate - but can be refactored later, especially if we introduce other strategies that share common elements.

No, that's fine. At most, a comment in the code would be nice, but it's not required.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @pavolloffay)

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r8.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @pavolloffay)

Copy link
Contributor Author

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @pavolloffay)


pkg/deployment/collector.go, line 70 at r1 (raw file):

Previously, objectiser (Gary Brown) wrote…

Now explicit 'streaming' strategy is used to determine whether ingester is required.

Done.

@objectiser objectiser merged commit b71a4ee into jaegertracing:master Jan 28, 2019
@objectiser
Copy link
Contributor Author

@pavolloffay don't think any of your remaining points were blockers - but can be dealt with in separate PR if they are still issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants